Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!zaphod.mps.ohio-state.edu!mips!pacbell.com!att!emory!ox.com!umich!gumby!wmu-coyote!campbell From: campbell@sol.cs.wmich.edu (Paul Campbell) Newsgroups: comp.compression Subject: Re: Is there better lossless compression than LZW? Message-ID: <1991May29.234224.2335@sol.cs.wmich.edu> Date: 29 May 91 23:42:24 GMT References: <2722@pdxgate.UUCP> <1991May28.231022.11197@unislc.uucp> Organization: Western Michigan Univ. Comp. Sci. Dept. Lines: 45 The only other improvement you can make beyond LZ derivatives and Arithmetic encoding is either hybrids or heuristics. For instance, you could do some sort of wierd algorithm where you create a half adaptive LZ and half fixed LZ encoding scheme, which is a heuristic approach. If someone can find a neat, intelligent way to do this, that would be great. It is obvious that there is something to be had here when you do an unbounded adaptive LZ compression, but keep the table and then run it again using the fixed table. Then you can also remove the unused entries. On something like English text, this method just might result in even high compression ratios, but this is speculation. The coders on here might want to try this. I have never had either the software, the time, or the machine to try this. The other possibility is hybrids. Lots of MS-DOG and other compressors are based off the idea of combining 2 or more compressors. For instance, a lot of them will use just Run Length encoding to remove the repitition from a file. It does very well when combined with Arithmetic or Huffman encoding, and a little bit for LZ stuff. Almost universally, though, they take an LZ compression with a low maximum table size (10-13 bits on the average) and feed the resulting codes to an Arithmetic compressor for further compression, which actually does achieve good results. The only alternative is to know something about the data you are compressing. With sound and graphics, you can go to a lossy compression. With text, you can start with the system dictionary or a phoneme table to do an initial compression (even n-grams help here) and then hit it with LZ. The real advantage to dictionary compression without any other compression is that you can still do keyword and other types of searches on the text. Under Minix, it compresses all assembly language outputs by turning the mnemonics into single tokens that are in the range of $80+ right in the text stream. I forgot the guy's name, but he's applying the same thing he did to fractals onto graphics and getting phenomonal compression ratios (10,000:1). He has a simple formula where a major class of fractals can be represented with about 8 parameters. Then he does pattern matching to try to find a set of fractals that match the image. Don't forget the other major advantage of data compression: until we can use memory chips for everything, you've still got to deal with slow storage devices. If you can compress something to half it's original size, with the amount of time that the CPU spends idle waiting for loads (you could also use a dedicated sub-processor for this on a Unix box some day), you will double your throughput by compressing/decompressing the data stream. BBS users are already reaping the benefits from this because almost all BBS's store files in some sort of compressed format and transmit it over very slow 2400 bps lines in most cases. When I'm waiting for a file transfer, I have plenty of time to think of ways to get it to my machine faster, and the protocol twiddling has pretty much reached a zenith.