Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!zaphod.mps.ohio-state.edu!mips!pacbell.com!att!emory!ox.com!umich!gumby!wmu-coyote!campbell
From: campbell@sol.cs.wmich.edu (Paul Campbell)
Newsgroups: comp.compression
Subject: Re: Is there better lossless compression than LZW?
Message-ID: <1991May29.234224.2335@sol.cs.wmich.edu>
Date: 29 May 91 23:42:24 GMT
References: <2722@pdxgate.UUCP> <1991May28.231022.11197@unislc.uucp>
Organization: Western Michigan Univ. Comp. Sci. Dept.
Lines: 45

The only other improvement you can make beyond LZ derivatives and Arithmetic
encoding is either hybrids or heuristics. For instance, you could do some sort
of wierd algorithm where you create a half adaptive LZ and half fixed LZ
encoding scheme, which is a heuristic approach. If someone can find a neat,
intelligent way to do this, that would be great. It is obvious that there is
something to be had here when you do an unbounded adaptive LZ compression, but
keep the table and then run it again using the fixed table. Then you can also
remove the unused entries. On something like English text, this method just
might result in even high compression ratios, but this is speculation. The
coders on here might want to try this. I have never had either the software,
the time, or the machine to try this.
	The other possibility is hybrids. Lots of MS-DOG and other compressors
are based off the idea of combining 2 or more compressors. For instance, a lot
of them will use just Run Length encoding to remove the repitition from a file.
It does very well when combined with Arithmetic or Huffman encoding, and a
little bit for LZ stuff. Almost universally, though, they take an LZ
compression with a low maximum table size (10-13 bits on the average) and
feed the resulting codes to an Arithmetic compressor for further compression,
which actually does achieve good results.
	The only alternative is to know something about the data you are
compressing. With sound and graphics, you can go to a lossy compression. With
text, you can start with the system dictionary or a phoneme table to do an
initial compression (even n-grams help here) and then hit it with LZ. The real
advantage to dictionary compression without any other compression is that you
can still do keyword and other types of searches on the text.
	Under Minix, it compresses all assembly language outputs by turning the
mnemonics into single tokens that are in the range of $80+ right in the text
stream.
	I forgot the guy's name, but he's applying the same thing he did to
fractals onto graphics and getting phenomonal compression ratios (10,000:1).
He has a simple formula where a major class of fractals can be represented with
about 8 parameters. Then he does pattern matching to try to find a set of
fractals that match the image.

	Don't forget the other major advantage of data compression: until we
can use memory chips for everything, you've still got to deal with slow storage
devices. If you can compress something to half it's original size, with the
amount of time that the CPU spends idle waiting for loads (you could also use
a dedicated sub-processor for this on a Unix box some day), you will double
your throughput by compressing/decompressing the data stream. BBS users are
already reaping the benefits from this because almost all BBS's store files in
some sort of compressed format and transmit it over very slow 2400 bps lines
in most cases. When I'm waiting for a file transfer, I have plenty of time
to think of ways to get it to my machine faster, and the protocol twiddling
has pretty much reached a zenith.