Newsgroups: comp.compression
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!zaphod.mps.ohio-state.edu!wuarchive!uunet!viusys!uxui!unislc!ttobler
From: ttobler@unislc.uucp (Trent Tobler)
Subject: Re: Is there better lossless compression than LZW?
References: <2722@pdxgate.UUCP>
Message-ID: <1991May28.231022.11197@unislc.uucp>
Organization: unisys
Date: Tue, 28 May 91 23:10:22 GMT

From article <2722@pdxgate.UUCP>, by hal@eecs.cs.pdx.edu (Aaron Harsh):
> 
>   The only lossless compression algorithms I've seen have either been LZW
> or its derivatives (AP, MW and Y-coding) or Huffman encoding.  These seem
> fast, but their compression doesn't seem that remarkable.  Are there any
> other general purpose algorithms that achieve better results, even at the
> expense of speed?
> 
What do you mean by thier compression does not seem remarkable.  For
data where the redundant information is in it's symbol frequency (huffman
encoding) or symbol string frequency (LZ type algorythms) the compression
approaches the max compression for large files.  English text has both of these
properties, and therefore good compression ratios are achieved with either
algorithms (although LZ works somewhat better because symbol strings are
common).  If redundant information exists in other forms, or the amount
of data to compress is small, then the general algorythems are somewhat less
than maximal.  There is a limit for the  compression ratio that is possible in
a specific class of data.  The 'BEST' compression algorithm is the one that
can approach this limit for that data.

There are algorithems that work better than LZ and Huffman for the
class of data they operate on.

     Full Arithmetic encoding achieves better compression than huffman
     with the drawback of being significantly slower

     Some kind of string look-ahead combined with some heuristic algorithem
     to decide if adding the string to the dictionary is useful, as well as
     intelligently removing entries from the dictionary; combined with full
     Arithmetic encoding will beat the LZ type algorithms, of course with
     the cost of being significantly slow/requiring more memory.

It is a hypothises of mine that as the ratio of compression approaches
the limit of compression for a class of data, the amount of memory and
amount of time increase.  Something along the lines of 
   m*t = 1/(d-L)
where m:-memory, t:-time, d:-desired compression, L:-maximal compression.

Anyone have any comments, or have seen something like this?

--
     Trent Tobler - ttobler@csulx.weber.edu