Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!bionet!agate!garnet.berkeley.edu!greg From: greg@garnet.berkeley.edu (Greg Kuperberg) Newsgroups: comp.compression Subject: You can't get something for nothing. Message-ID: <1991Apr10.193417.28193@agate.berkeley.edu> Date: 10 Apr 91 19:34:17 GMT References: <5351@ns-mx.uiowa.edu> <4931@pink1.UUCP> Sender: usenet@agate.berkeley.edu (USENET Administrator) Reply-To: greg@math.berkeley.edu Organization: U.C. Berkeley Lines: 32 In article <4931@pink1.UUCP> beville@motcid.UUCP (Anthony T. Beville) writes: >It seems to me that a scheme using the the digits of >and irrational number ( or a repeating rational number, for that >matter) might be able to yeild some really good compression >ratios. > >For any given sequence of bits longer than some number, >merely (!) find a matching sequence deep within the bowels >of pi or sqrt(2) or some fraction or whatever, and represent >that sequence with a greatly reduced number of bits. (I'm not sure if this is a serious proposal, but I'll err on the side of courtesy and assume that it is.) This the kind of algorithm can't possibly work, because it assumes nothing about the input stream. A data compression scheme which shortens some data streams must necessarily lengthen others (if only by a small amount). Some data compression algorithms, like Lempel-Ziv, are called "universal", as if to suggest that they assume nothing about the input stream. This is not the case. Lempel-Ziv only shortens messages with a bias in the distribution of bits or strings of bits. It is universal in the sense that it makes no prior assumption about what that bias is, only that it exists. The catch in this particular case is that most sequences of bits occur so far down in the expansion of pi or sqrt(2) that you will need at least as many bits as you started with to describe their locations. ---- Greg Kuperberg greg@math.berkeley.edu