Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!hao!ames!nrl-cmf!umix!delrio!usenet From: usenet@delrio.cc.umich.edu (Usenet News) Newsgroups: news.misc Subject: Re: Expansion after compression for MS-DOS arc files Message-ID: <386ce279.c6e5@delrio.cc.umich.edu> Date: Thu, 12-Nov-87 00:41:32 EST Article-I.D.: delrio.386ce279.c6e5 Posted: Thu Nov 12 00:41:32 1987 Date-Received: Sun, 15-Nov-87 03:05:41 EST References: <3346@uwmcsd1.UUCP> <2255@mcdchg.UUCP> Reply-To: hyc@starbarlounge.cc.umich.edu (Howard Chu) Distribution: all Organization: University of Michigan Computing Center, Ann Arbor Lines: 33 UUCP-Path: {uunet,rutgers}!umix!hyc In article <2255@mcdchg.UUCP> heiby@mcdchg.UUCP (Ron Heiby) writes: %Christopher N Maag (cmaag@csd4.milw.wisc.edu.UUCP) writes: %> "If you submit a file to one of the newsgroups and you wish to uuencode %> the file, _do not_ perform any type of file compression to this %> file before uuencoding it. This means don't arc the file, (insert %> other popular compression schemes for other computer systems here). %> If you do compress the file, it will actually get _larger_ when it is %> sent than it was originally. This costs us all money." % %I think Chris is going further than I suggested. I have no evidence that %the problem is compressing before uuencoding, and I suspect that it has %little to do with it. I was talking about the difference between sending %clear text and sending compressed/uuencoded text. I think it would be %interesting to check on what Chris is suggesting and get some numbers on %the difference between sending uuencoded binary files vs uuencoded compressed %binary files. I suspect that a uuencoded compressed binary file would %actually be smaller, but the further impact of the news software's compress %on the resulting files is unknown. %-- %Ron Heiby, heiby@mcdchg.UUCP Moderator: comp.newprod & comp.unix %"I know engineers. They love to change things." McCoy In fact the results *are* known... This comes up every couple months, and the plain fact is that running the compress algorithm twice on a piece of data *WILL* generate a larger file. Generally 30% larger. This will happen with both ARC files and files compressed by compress (4.0). They don't use identical algorithms, but both use modified Lempel-Ziv encoding schemes, and both react in the same way to being 'run over themselves.' Note - this is on the binary data itself. If you uuencode a compressed file, you will probably win in the long run. Figure about 40% compression, and 25% expansion, and *then* on some sites you'll get more compression during actual transit. Since the Lempel-Ziv scheme works so well on strings of printable text, in fact, it might be the optimal solution to post binaries as uuencoded compressed data...