Path: utzoo!attcan!uunet!cs.utexas.edu!csd4.milw.wisc.edu!bionet!ig!ames!xanth!bander
From: bander@cs.odu.edu (James Bander)
Newsgroups: comp.arch
Subject: Re: SPEC Bench-a-Thon
Summary: You cannot compress a compressed file.
Keywords: SPEC performance benchmarks systems, file compression
Message-ID: <9353@xanth.cs.odu.edu>
Date: 24 Jun 89 18:26:04 GMT
References: <22031@abbott.mips.COM> <22033@abbott.mips.COM> <670@biar.UUCP>
Reply-To: bander@cs.odu.edu (James Bander)
Organization: Old Dominion University, Norfolk Va.
Lines: 51

In article <670@biar.UUCP> trebor@biar.UUCP (Robert J Woodhead) writes:
>
>	Well, if you want to get elegant, have compress compress it's
>	own source code, then compress the compressed file, and repeat
>	the process N times...  ;^)

That does sound elegant; unfortunately it would not be a fair test.  File
compression is based on information theory; files are compressed by reducing
redundancy.  The reason that file compression programs work is that English 
text, programs, data files, and so forth, contain plenty of redundant
information.  Languages invented by human beings work that way.

The second time a file is compressed, there is less redundancy, so
the compression program does less work.

In fact, the Unix compress program uses an adaptive Lempel-Ziv file compression
algorithm, based on a sliding dictionary.  Lempel and Ziv proved that for very
large files their algorithm is optimal--it eliminates all redundancy.  Even
for smaller files the Lempel-Ziv algorithm does no compression the second time. 

Try it--take a file, run it through compress, rename the file so it doesn't 
have the .Z extension, and run it through compress again.  Unix's compress
utility refuses to generate output on the second pass.  (Even if it did 
generate output that was the same size as the input, the program would have
done less computation on the compressed file than on a random- or human-
generated file.)


References:  James A. Storer, Data Compression: Methods and Theory.
                   Computer Science Press 1988
             J. Ziv & A. Lempel, "A Universal Algorithm for Sequential Data
                   Compression", IEEE Transactions Info. Theory 23:3
                   (1977), 337-343.
             J. Ziv & A. Lempel, "Compression of Individual Sequences Via
                   Variable-Rate Coding", IEEE Trans. Info. Theory 24:5
                   (1978), 530-536.


-----------------------------------------------------------------------------
Jim Bander, Tidewater Consultants, Inc., Virginia Beach, VA
    
                                          Internet:  bander@cs.odu.edu
                                          Ma Bell:   (804) 497-8951
                                          USnail:    160 Newtown Rd. #415
                                                     Va. Beach, VA 23462

"The presence of humans, in a system containing high-speed electronic
computers and high-speed, accurate communications, is quite inhibiting." 
                                                 --Stuart L. Seaton, 1958
----------------------------------------------------------------------------
Disclaimer:  All my mistaken opinions are my own.