Path: utzoo!attcan!uunet!cs.utexas.edu!csd4.milw.wisc.edu!bionet!ig!ames!xanth!bander From: bander@cs.odu.edu (James Bander) Newsgroups: comp.arch Subject: Re: SPEC Bench-a-Thon Summary: You cannot compress a compressed file. Keywords: SPEC performance benchmarks systems, file compression Message-ID: <9353@xanth.cs.odu.edu> Date: 24 Jun 89 18:26:04 GMT References: <22031@abbott.mips.COM> <22033@abbott.mips.COM> <670@biar.UUCP> Reply-To: bander@cs.odu.edu (James Bander) Organization: Old Dominion University, Norfolk Va. Lines: 51 In article <670@biar.UUCP> trebor@biar.UUCP (Robert J Woodhead) writes: > > Well, if you want to get elegant, have compress compress it's > own source code, then compress the compressed file, and repeat > the process N times... ;^) That does sound elegant; unfortunately it would not be a fair test. File compression is based on information theory; files are compressed by reducing redundancy. The reason that file compression programs work is that English text, programs, data files, and so forth, contain plenty of redundant information. Languages invented by human beings work that way. The second time a file is compressed, there is less redundancy, so the compression program does less work. In fact, the Unix compress program uses an adaptive Lempel-Ziv file compression algorithm, based on a sliding dictionary. Lempel and Ziv proved that for very large files their algorithm is optimal--it eliminates all redundancy. Even for smaller files the Lempel-Ziv algorithm does no compression the second time. Try it--take a file, run it through compress, rename the file so it doesn't have the .Z extension, and run it through compress again. Unix's compress utility refuses to generate output on the second pass. (Even if it did generate output that was the same size as the input, the program would have done less computation on the compressed file than on a random- or human- generated file.) References: James A. Storer, Data Compression: Methods and Theory. Computer Science Press 1988 J. Ziv & A. Lempel, "A Universal Algorithm for Sequential Data Compression", IEEE Transactions Info. Theory 23:3 (1977), 337-343. J. Ziv & A. Lempel, "Compression of Individual Sequences Via Variable-Rate Coding", IEEE Trans. Info. Theory 24:5 (1978), 530-536. ----------------------------------------------------------------------------- Jim Bander, Tidewater Consultants, Inc., Virginia Beach, VA Internet: bander@cs.odu.edu Ma Bell: (804) 497-8951 USnail: 160 Newtown Rd. #415 Va. Beach, VA 23462 "The presence of humans, in a system containing high-speed electronic computers and high-speed, accurate communications, is quite inhibiting." --Stuart L. Seaton, 1958 ---------------------------------------------------------------------------- Disclaimer: All my mistaken opinions are my own.