Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!rpi!sarah!bingnews!kym
From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell)
Newsgroups: comp.compression
Subject: Re: theoretical compression factor
Message-ID: <1991Apr2.034441.28170@bingvaxu.cc.binghamton.edu>
Date: 2 Apr 91 03:44:41 GMT
References: <20135@alice.att.com> <1991Mar29.031127.9128@bingvaxu.cc.binghamton.edu> <20144@alice.att.com>
Organization: State University of New York at Binghamton
Lines: 38

In article <20144@alice.att.com> jj@alice.UUCP (jj, like it or not) writes:
>What you're doing is building a model.  If your model fits, you can
>do better than a historyless method.  The model itself is the
>history, you have to choose the model (you call it a random number
>generator, stoichastic methods, LPC methods, and such suggest other
>approaches) from the knowledge of the signal.  That's your
>history.

I'm not sure I agree with you.

The little program I used previously to demonstrate that `p lg p' could be 
beaten by an historyless technique does not, to my mind, build any model of 
the data.

It does not examine the data to `fit' the random number generator to it 
(although I have others that do this too). It just compares the input with a 
bit stream and records where they're different and stores the `deltas'
between these positions in the output. Nothing is kept to `improve' anything 
as we go along; nothing is assumed before we start (the random bit
generator was selected at random). 

I guess the mere _selection_ of the random bit generator _may_ be considered 
building a model of the data -- in which case _any_ compression technique 
selects such a model & there _is_ no historyless compression (as you
seem to imply that model-building and historylessness are exclusive).

But to get back to my original question.

In article <20135@alice.att.com> jj@alice.att.com (jj, like it or not) writes:
>First, the question of "how much can I compress X" must
>be qualified.  If you have a historyless compression,
>you can compress up to the entropy ( sum of -p log p) bound,
                                                      ^^^^^^
>where p is the probability of each token.

Is p lg p a hard-and-fast bound or not? I still think it's an average.

-kym