Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!ns-mx!pyrite.cs.uiowa.edu From: jones@pyrite.cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879) Newsgroups: comp.compression Subject: Re: Is arithmetic compression a crock? Message-ID: <6285@ns-mx.uiowa.edu> Date: 31 May 91 14:12:43 GMT References: <1991May30.035552.14435@alembic.acs.com> Sender: news@ns-mx.uiowa.edu Lines: 37 From article <1991May30.035552.14435@alembic.acs.com>, by csu@alembic.acs.com (Dave Mack): > > Is this a really bad implementation, a really bad algorithm, or > is arithmetic compression basically useless for text compression? Arithmetic coding has never been advertised as a fast method, and the speed you see is typical of the results others have obtained. As for compression performance, the code in the Witten Neal and Cleary paper only uses a single-state source model. To compress as well as LZW compresses, you need multiple states in your source model. Witten Neal and Cleary used a move-to-front cumulative frequency table, occupying 257 16 bit words. If you want a 64 state source model, you will need 64 of these tables (for a total cost of 16448 words or 32896 bytes. In my CACM paper on Splay Tree Based compression (vol 31 no 8), I showed that a 64 state source model (where the next state is determined by the previous character mod 64) works pretty well for splay trees, and it should work even better for arithmetic codes. The 64 state source model I suggested uses much less memory than the hash tables used by LZW, and it should compress significantly better. My splay-tree based code was quite good with this 64 state source model, frequently beating LZW, and it is a simple prefix code where each source character codes to at least one bit. In a multiple-state source model, many characters will be so common in some states that they will code to less than a bit when arithmetic codes are used. Finally, note that a multiple state source model does not slow things down significantly. You pay about one array index operation per character compressed to find the base of the sub-array to be used in compressing that character. Doug Jones jones@cs.uiowa.edu