Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!bionet!agate!garnet.berkeley.edu!greg
From: greg@garnet.berkeley.edu (Greg Kuperberg)
Newsgroups: comp.compression
Subject: You can't get something for nothing.
Message-ID: <1991Apr10.193417.28193@agate.berkeley.edu>
Date: 10 Apr 91 19:34:17 GMT
References: <5351@ns-mx.uiowa.edu> <4931@pink1.UUCP>
Sender: usenet@agate.berkeley.edu (USENET Administrator)
Reply-To: greg@math.berkeley.edu
Organization: U.C. Berkeley
Lines: 32

In article <4931@pink1.UUCP> beville@motcid.UUCP (Anthony T. Beville) writes:
>It seems to me that a scheme using the the digits of
>and irrational number ( or a repeating rational number, for that
>matter) might be able to yeild some really good compression
>ratios.  
>
>For any given sequence of bits longer than some number,  
>merely (!) find a matching sequence deep within the bowels 
>of pi or sqrt(2) or some fraction or whatever, and represent
>that sequence with a greatly reduced number of bits.  

(I'm not sure if this is a serious proposal, but I'll err on the
side of courtesy and assume that it is.)

This the kind of algorithm can't possibly work, because it assumes
nothing about the input stream.  A data compression scheme which
shortens some data streams must necessarily lengthen others (if only by
a small amount).

Some data compression algorithms, like Lempel-Ziv, are called
"universal", as if to suggest that they assume nothing about the input
stream.  This is not the case.  Lempel-Ziv only shortens messages with a
bias in the distribution of bits or strings of bits.  It is universal
in the sense that it makes no prior assumption about what that bias is,
only that it exists.

The catch in this particular case is that most sequences of bits
occur so far down in the expansion of pi or sqrt(2) that you will need
at least as many bits as you started with to describe their locations.
----
Greg Kuperberg
greg@math.berkeley.edu