Path: utzoo!attcan!utgpu!cs.utexas.edu!uunet!lll-winken!xanth!ames!uhccux!virtue!aukuni.ac.nz!pgut1 From: pgut1@iti.org (PeterClaus Gutmann ) Newsgroups: alt.sources.d Subject: Re: Compression algorithm Keywords: compression, data storage, data transmission, trees, graphs Message-ID: Date: 28 Sep 90 06:31:46 GMT Sender: news@ccu1.aukuni.ac.nz (News Owner) Distribution: alt Organization: University of Auckland, New Zealand. Lines: 58 In <986@tharr.UUCP> gtoal@tharr.UUCP (Graham Toal) writes: >These are my thoughts on a new compression algorithm to replace Lempel- >Zev-Welch.* They are very preliminary, and I haven't coded any of it. I throw >it out for comment, and if anyone wants to modify it and/or code it up, >please feel free to do so. >(* if it becomes necessary because of the software patent issue) [Long description of compression algorithm] ................. >(Node numbers could be output using escape codes, but it would probably be >better to use something like huffman on an alphabet size which is large >enough to encompass the tree. Note that the most frequent codes, near the >bottom of the tree, have large numbers. A neat encoding scheme should >undo this effect.) >The next problem is that a tree like this is not sensible for large files; >clearly we want some sort of windowing mechanism. I suggest we pick some >maximum power of two (here we're using 16 for the sake of demonstration, >although a real program might use 16384) ................. >Happy hacking. >Graham Toal Algorithms to do this type of compression already exist. Lempel and Ziv did two compression schemes, generally referred to as LZ77 and LZ78 (after the years in which they were published in the IEEE Transactions on Information Theory). LZW is based on the LZ78 scheme, the algorithm you propose is like their LZ77 scheme. This (LZ77) algorithm is used by PKZIP and LHARC as the "front-end" for their compression. LHARC then uses adaptive Huffman coding to compress the output of the LZ compressor, PKZIP uses Shannon-Fano coding. The LZ77-based compressors generally work by using an 8K window with various data structures to handle the maximum-length string matching. I've written a compressor which seems to achieve much better results with a 16K window, at the expense of being a bit slower. There is also a very fast compression algorithm invented by Ed Fiala and Dan Greene (published in Communications of the ACM, Vol.32, No.4 (April 1989), p490) which is a sort of hybrid between LZ77 and LZ78, which uses a very fancy data structure to build a DAG (much like you are proposing). The problem with this algorithm is that, like LZW, it is also patented. The LZ77-compressors used in PKZIP and LHARC are based on the LZSS compression scheme, which is described fairly well by Tim Bell in the IEEE Trans. on Communications, Vol.34, No.12 (Dec.1986), p.1176..... waffle burble waffle (this is one of my pet topics :-) Peter. -- [In order of reliability] Peter_Gutmann@kcbbs.gen.nz or peter@nacjack.gen.nz or pgut1@compsc.aukuni.ac.nz Disclaimer: The contents of this message are irrelevant. It serves merely as a covert channel for a data leak to The Commies.