Newsgroups: comp.compression Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!zaphod.mps.ohio-state.edu!wuarchive!uunet!viusys!uxui!unislc!ttobler From: ttobler@unislc.uucp (Trent Tobler) Subject: Re: Is there better lossless compression than LZW? References: <2722@pdxgate.UUCP> Message-ID: <1991May28.231022.11197@unislc.uucp> Organization: unisys Date: Tue, 28 May 91 23:10:22 GMT From article <2722@pdxgate.UUCP>, by hal@eecs.cs.pdx.edu (Aaron Harsh): > > The only lossless compression algorithms I've seen have either been LZW > or its derivatives (AP, MW and Y-coding) or Huffman encoding. These seem > fast, but their compression doesn't seem that remarkable. Are there any > other general purpose algorithms that achieve better results, even at the > expense of speed? > What do you mean by thier compression does not seem remarkable. For data where the redundant information is in it's symbol frequency (huffman encoding) or symbol string frequency (LZ type algorythms) the compression approaches the max compression for large files. English text has both of these properties, and therefore good compression ratios are achieved with either algorithms (although LZ works somewhat better because symbol strings are common). If redundant information exists in other forms, or the amount of data to compress is small, then the general algorythems are somewhat less than maximal. There is a limit for the compression ratio that is possible in a specific class of data. The 'BEST' compression algorithm is the one that can approach this limit for that data. There are algorithems that work better than LZ and Huffman for the class of data they operate on. Full Arithmetic encoding achieves better compression than huffman with the drawback of being significantly slower Some kind of string look-ahead combined with some heuristic algorithem to decide if adding the string to the dictionary is useful, as well as intelligently removing entries from the dictionary; combined with full Arithmetic encoding will beat the LZ type algorithms, of course with the cost of being significantly slow/requiring more memory. It is a hypothises of mine that as the ratio of compression approaches the limit of compression for a class of data, the amount of memory and amount of time increase. Something along the lines of m*t = 1/(d-L) where m:-memory, t:-time, d:-desired compression, L:-maximal compression. Anyone have any comments, or have seen something like this? -- Trent Tobler - ttobler@csulx.weber.edu