Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp Path: utzoo!watmath!clyde!burl!ulysses!bellcore!decvax!decwrl!sun!guy From: guy@sun.uucp (Guy Harris) Newsgroups: net.lang.c,net.unix Subject: Re: Re: Compaction Algorithm (pack vs compress) Message-ID: <3302@sun.uucp> Date: Fri, 28-Feb-86 22:50:40 EST Article-I.D.: sun.3302 Posted: Fri Feb 28 22:50:40 1986 Date-Received: Sun, 2-Mar-86 00:24:07 EST References: <207@pierce.UUCP> <3261@sun.uucp> <226@uvacs.UUCP> Organization: Sun Microsystems, Inc. Lines: 105 Xref: watmath net.lang.c:8024 net.unix:7257 > Hold on thar, Babaloo! If you mean better in terms of *byte* savings, I > imagine ``compress'' could easily do better than ``pack''. But if you're > talking about *block* savings, I'm dubious that ``pack'' will be much > improved upon. I don't know about your system, but my Vax running 4.2 BSD > permits internal fragmentation, so it's disk block savings that count. 1) I assume you "don't know about (my) system" because your news reading program doesn't display the "Organization:" line. FYI, as it's sent out from this site, it's "Sun Microsystems, Inc." I'm sure you can figure out what system my machine is running from that. 2) Most operating systems do internal fragmentation. It's hardly specific to VAXes running 4.2BSD. Even Suns running 4.2BSD do it. (It's also hardly a question of "permitting" internal fragmentation. It's not as if UNIX gives you a choice. If you don't want internal fragmentation, you phave to put several files together into something like an "ar" or "tar" archive".) > Now, I'm not entirely familiar with ``compress'', but I can compare with > ``compact''. When I created a file of 2000 bytes (one identical character > per line plus a newline), ``compress'' boasted of > 85% compression, while > pack only claimed 50% compression, but each of the results consumed the same > amount of blocks. Hence the same effective compression. Big deal. A 2000-byte file is two 1K frags. The person was asking about "load modules", by which I presume he means executable images. "/bin/cat" is 24 1K frags on my system ("'cat -v' considered harmful" flames to /dev/null, please, I'm just reporting the facts). They were probably interested in compressing large executable images, so the internal fragmentation is probably a very small percentage of the file size, and thus the savings in blocks is infinitesimally different from the savings in bytes. OK, let's terminate the debate with some hard data: Original file: 664 -rwxr-xr-x 1 root 671744 Feb 19 12:25 /usr/bin/suntools "pack": 520 -rwxr-xr-x 1 guy 519660 Feb 28 18:33 suntools.z pack: suntools: 22.6% Compression real 1m2.98s user 0m46.00s sys 0m12.91s "compact": 520 -rwxr-xr-x 1 guy 519916 Feb 28 18:55 suntools.C suntools: Compression : 22.60% real 16m17.15s user 12m44.50s sys 0m15.15s "compress": suntools: Compression: 43.18% -- replaced with suntools.Z real 1m39.90s user 1m25.65s sys 0m4.63s 384 -rwxr-xr-x 1 guy 382395 Feb 28 18:36 suntools.Z It seems "compress" really does provide a significant improvement on the *block* usage of the file in question, however "dubious" you may be of those results. "compact" and "pack" get results which are infinitesmally different. BTW, I tried "compress", "compact", and "pack" on a 2048-byte file in the exact same format you describe. "compress" reported a 95% compression, "compact" reported an 81% compression, and "pack" reported an 81% compression. All three reduced the file from two frags to one. I then tried it on a 2000-byte file in the format you describe, and got the exact same results as on the 2048-byte file. Your numbers look flaky. > Sqeezing a few extra bytes out of a file can only be worth it if it results > in reducing by the 1K minimum data block size (2 basic file system blocks). > Is this often the case? Yes, even on 2000-byte files consisting of 1000 identical lines. > (On my system, ``pack'' runs considerably faster than ``compact'', so the > choice is easy.) Since "pack" also gives results which are as good as those "compact" gives, the choice is *very* easy unless you need something which makes only one pass over its input (e.g. because it's reading from a pipe), in which case "pack" won't cut it. "compress" is only a small amount slower than "pack" in CPU usage (at least when compared to the speed difference between "pack" and "compact"), gives much better results than "pack" or "compact", and makes only one pass over its input. The only disadvantage is that it tends to eat virtual (and physical) memory; as Peter Honeyman once put it, "more than two compresses makes disks dance!" I don't care, since my machine is a one-user machine, but on a multi-user machine this may make a difference. I'm also not sure whether the latest "compress" uses memory that freely. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.arpa (yes, really)