Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site turtlevax.UUCP Path: utzoo!utcs!lsuc!pesnta!amd!turtlevax!ken From: ken@turtlevax.UUCP (Ken Turkowski) Newsgroups: net.dcom,net.micro Subject: Re: Squeezing files. Message-ID: <789@turtlevax.UUCP> Date: Wed, 19-Jun-85 15:44:58 EDT Article-I.D.: turtleva.789 Posted: Wed Jun 19 15:44:58 1985 Date-Received: Thu, 20-Jun-85 03:57:19 EDT References: <1414@ecsvax.UUCP> <784@turtlevax.UUCP> <1861@ukma.UUCP> Reply-To: ken@turtlevax.UUCP (Ken Turkowski) Organization: CADLINC, Inc. @ Menlo Park, CA Lines: 37 Xref: utcs net.dcom:1045 net.micro:10496 In article <1861@ukma.UUCP> sean@ukma.UUCP (Sean Casey) writes: >In article <784@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes: >>I think you should consider changing to Lempel-Ziv Compression (posted >>to the net as "compress", version 3.0), which normally gives 70% >>compression (30% of original size) to text. The program is fast, and >>adapts to whatever type of data you give it, unlike static Huffman >>coding. It usually produces 90% (!) compression on binary images. > >WHOA BUDDY! > >Lempel-Ziv doesn't do NEARLY that well. We've been using it for >months, and we've found that text and program sources usually get about >55-65% compression, while binaries get about 45-55% compression. This >is encountered in the optimal case of compressing a large archive of >files. As files get smaller, expecially as they drop below about 8k in >size, compression worsens. I seriously doubt that most binaries contain >only 10% of unambiguous information, much less being compressable to >that size. I can see that we have a semantic problem here. By "image", I mean a picture, or two-dimensional signal. By "binary", I mean ones and zeros, black and white, no grey-scale, no color. A binary image is then a coarsely quantized picture, with lots of runs of zeros and ones. L-Z does exceptionally well on these type of data, and I will reiterate my claim of 90% average compression. As far a program source code and executable machine code, I get the same types of compression ratios as you. I'm curious; what is the etymology of the word "binary" as it is sometimes used to refer to executable machine code? And why does it imply program rather than data? -- Ken Turkowski @ CADLINC, Menlo Park, CA UUCP: {amd,decwrl,hplabs,nsc,seismo,spar}!turtlevax!ken ARPA: turtlevax!ken@DECWRL.ARPA