Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!rpi!sarah!bingnews!kym From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) Newsgroups: comp.compression Subject: Re: theoretical compression factor Message-ID: <1991Apr2.034441.28170@bingvaxu.cc.binghamton.edu> Date: 2 Apr 91 03:44:41 GMT References: <20135@alice.att.com> <1991Mar29.031127.9128@bingvaxu.cc.binghamton.edu> <20144@alice.att.com> Organization: State University of New York at Binghamton Lines: 38 In article <20144@alice.att.com> jj@alice.UUCP (jj, like it or not) writes: >What you're doing is building a model. If your model fits, you can >do better than a historyless method. The model itself is the >history, you have to choose the model (you call it a random number >generator, stoichastic methods, LPC methods, and such suggest other >approaches) from the knowledge of the signal. That's your >history. I'm not sure I agree with you. The little program I used previously to demonstrate that `p lg p' could be beaten by an historyless technique does not, to my mind, build any model of the data. It does not examine the data to `fit' the random number generator to it (although I have others that do this too). It just compares the input with a bit stream and records where they're different and stores the `deltas' between these positions in the output. Nothing is kept to `improve' anything as we go along; nothing is assumed before we start (the random bit generator was selected at random). I guess the mere _selection_ of the random bit generator _may_ be considered building a model of the data -- in which case _any_ compression technique selects such a model & there _is_ no historyless compression (as you seem to imply that model-building and historylessness are exclusive). But to get back to my original question. In article <20135@alice.att.com> jj@alice.att.com (jj, like it or not) writes: >First, the question of "how much can I compress X" must >be qualified. If you have a historyless compression, >you can compress up to the entropy ( sum of -p log p) bound, ^^^^^^ >where p is the probability of each token. Is p lg p a hard-and-fast bound or not? I still think it's an average. -kym