Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!viusys!alembic!csu From: csu@alembic.acs.com (Dave Mack) Newsgroups: comp.compression Subject: Is arithmetic compression a crock? Message-ID: <1991May30.035552.14435@alembic.acs.com> Date: 30 May 91 03:55:52 GMT Organization: Alembic Computer Services, McLean VA Lines: 55 About a month ago, Kent Dolan posted a sample arithmetic coding compression package here. I finally got around to messing with it. For reference: From: xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) Newsgroups: comp.compression Subject: Re: looking for arith. codeing refs. Message-ID: <1991Apr22.130611.9492@zorch.SF-Bay.ORG> Date: 22 Apr 91 13:06:11 GMT References: <11020001@hppad.waterloo.hp.com> <19319@cs.utexas.edu> Organization: SF-Bay Public-Access Unix Lines: 598 > "Arithmetic Coding for Data Compression", Witten, Radford and Cleary, > Communications of the ACM, Volume 30, #6, June 1987 > This paper has a good description of arithmetic coding as well as > working C code. I'm sure that the code is online somewhere, but I > couldn't tell you where. I'll probably get harassed for this, but here it is. I added ANSI C prototypes, and for complicated reasons this exact Makefile is untested, and it needs unpacking on a system with long file name capability, or else hand editing to shorten the names before unpacking (and in the files, too.) So I modified the makefile to use gcc with -g and -O, then ran it on a large text file (jargon2.8.2, actually.) The results were pretty horrible: # time encode test.A 80.7 real 79.8 user 0.3 sys # time decode untest 101.1 real 98.7 user 0.5 sys # cmp test untest # ls -l test test.A -rw-r--r-- 1 csu 890267 May 29 23:19 test -rw-r--r-- 1 csu 522053 May 29 23:28 test.A This was done on a *very* lightly loaded Sun-4/110. For comparison, I ran an LZW compress over the same input: # time compress test 17.1 real 16.3 user 0.4 sys # ls -l test.Z -rw-r--r-- 1 csu 392609 May 29 23:19 test.Z # time uncompress test.Z 10.8 real 9.2 user 0.5 sys Is this a really bad implementation, a really bad algorithm, or is arithmetic compression basically useless for text compression? -- Dave Mack