Xref: utzoo alt.sources.d:1422 alt.flame:27845 Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!apple!vsi1!zorch!xanthian From: xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) Newsgroups: alt.sources.d,alt.flame Subject: Re: REPOST lharc102A Part 01/04 BSD Unix to Amiga archives Message-ID: <1991Feb1.091905.7715@zorch.SF-Bay.ORG> Date: 1 Feb 91 09:19:05 GMT References: <7563@sugar.hackercorp.com> <1991Jan23.071609.1401@zorch.SF-Bay.ORG> <1991Jan31.034127.18393@metapro.DIALix.oz.au> Organization: SF-Bay Public-Access Unix Lines: 170 bernie@metapro.DIALix.oz.au (Bernd Felsche) writes: > De-flamed deliberately. Spoil sport. xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: >> Second, "compress,uuencode,recompress" is not the best use of >> technology; I did a little test with the same files in just one big >> shar, to simplify the reporting of the results: > WHOAH THERE! Shouldn't you be using tar to generate the archive > instead of shar? Its wrapper information is more compact and > efficient. It is more efficient yet because putting everything in one big file lets compression proceed across file boundaries rather than start fresh at each file, but filewise storage is nearly as efficent. > Then you compress the tar archive... and uuencode it. Please try this > and publish the results for comparison. You had to ask; well, I was sitting home grumpy because I was too sick to make the party tonight, so why not: ------------------------------------------------------------------------- original data: 3091 Makefile 3841 amiga_patch 2885 generic_patch 11521 lh.doc.japanese 2800 lh.inst.japanese 6783 lh.n.japanese 13133 lhadd.c 29556 lharc.c 7568 lharc.doc.posted 11220 lharc.doc.revised 9279 lharc.h 9588 lharc.l 2010 lhdir.c 886 lhdir.h 6154 lhext.c 6504 lhio.c 1483 lhio.h 6672 lhlist.c 22476 lzhuf.c 1229 read.me_1 486 read.me_2 1770 read.me_3 original data size total of file sizes (from wc -c) 160935 lha three files uuencoded because they contain control characters: 15910 lh.doc.japanese.uu 3895 lh.inst.japanese.uu 9376 lh.n.japanese.uu original data size but with those three uuencodings instead: 169012 lha3uu Plan a, just sharing the original files, is unworkable, shars with control characters won't unpack reliably: 176274 lha.sh Plan b: current net practice; shar, compress: 184153 lha3uu.sh shar three files uuencoded, rest plain text; 82885 lha3uu.sh.Z its size as transmitted after compression Plan c: other current net practice; tar, compress, uuencode, compress: 180224 lha.tar original data tarred - not transmittable, so 73149 lha.tar.Z compress it and 100810 lha.tar.Z.uu uuencode it for safety; 91533 lha.tar.Z.uu.Z its size as transmitted after compression Plan d: improve plan b by replacing compress with lharc, uuencode, compress: 63604 lha3uu.sh.lzh lharc of shar file is binary 87666 lha3uu.sh.lzh.uu must be uuencoded to hide control characters; 79863 lha3uu.sh.lzh.uu.Z its size as transmitted after compression Plan e: improve plan c by replacing first compress by lharc: 56476 lha.tar.lzh lharc of tar file is binary 77844 lha.tar.lzh.uu must be uuencoded to hide control characters; 70839 lha.tar.lzh.uu.Z its size as transmitted after compression Plan f: improve plan d by replacing tar | compress by lharc: 56944 lha.lzh lharc of original files is binary 78484 lha.lzh.uu must be uuencoded to hide control characters; 71211 lha.lzh.uu.Z its size as transmitted after compression Note: step c is not the same as simple news transmission, where tar | compress | transmit | uncompress | untar is the paradigm, but that process is not required to create a news article as an intermediate product, and steps b to f must and do.) Note: zoo could also have been used whereever lharc was, but lharc compresses better, and so dominates the zoo data. Results: Costs in bytes Data Telecomm storage volume Plan 184153 82885 b: partial uuencode, shar, compress 100810 91533 c: tar, compress, uuencode, compress 87666 79863 d: partial uuencode, shar, lharc, uuencode, compress 77844 70839 e: tar, lharc, uuencode, compress 78484 71211 f: lharc, uuencode, compress The absolute storage champion is plan e, but plan f is nearly as good, and requires one fewer tools; neither of the current plans, nor plan d, has a lot to recommend it. The choice between e and f should be made mostly on economic grounds. ------------------------------------------------------------------------- > Depending on software versions, you can do all this in a pipe (which > you undoubtedly know) "tar cf - files | compress | uuencode > >bugs.tar.Z.uu" > For transmission, it can be compressed again, (it would be smarter to > uudecode) though this _should_ be done by a network layer, even though > it often isn't. Wouldn't it be nice if modem transfer protocols were > smart enough to compress on the fly? >> So in fact, for the files being sent, there is some modest _gain_ in >> telecommunications efficiency by using the best compression >> technology on text, and then uuencoding it and letting the standard >> net node to >node compression have its way with the files. > Agreed. In fact, the more text, the better the gain. >> I have yet to see a single argument for the present methods that >> comes down, at the last, to anything but sheer laziness on the part >> of those who don't want to change their habits. Compressed, uuencoded >> transmission methods win on every reasonable criterion. > Although one should be wary of zoo archives, which don't work well if > there are many small text files in it (i.e. typical source code). > Compression can be as little as 10-15%, which uuencoding explodes past > the original size. Yeah, lharc is _much_ better at compressing small files than is zoo, which is why putting a shar or tar wrapper around them and then zooing them looks better than zooing them separately. >> By the way, it is _not_ a solution to replace compress with a filter >> form of lharc as the typical file compressor for telecommunications; >> lharc is _much_ too slow to use at every step along the way, so it >> needs to be done just once at the originating site to accomplish >> these savings. > TANSTAFL. Kent, the man from xanth.