Xref: utzoo comp.compression:114 alt.comp.compression:190 Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!uunet!viusys!uxui!unislc!ttobler From: ttobler@unislc.uucp (Trent Tobler) Newsgroups: comp.compression,alt.comp.compression Subject: Re: theoretical compression factor Message-ID: <1991Mar26.220519.1242@unislc.uucp> Date: 26 Mar 91 22:05:19 GMT References: <1991Mar25.054838.15588@bingvaxu.cc.binghamton.edu> Organization: unisys Lines: 54 From article <1991Mar25.054838.15588@bingvaxu.cc.binghamton.edu>, by kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell): > > I found one little problem with my algebra & understanding. > Delta modulation takes N bits into O(N) bits, NOT less. > > The theoretical compression factor for my binary strings example > would therefore be > > 2p(1-p) > > where p is the proportion of 0's (or 1's -- it doesn't matter). > > For /usr/dict/words the proportion of 1's is about 56% (only > considering the lsb 7 bits of each char). This should give > a compression factor of about 49% with the method outlined. > > Tables of random ascii digits contain about .38 1's (taking only lsb 7 > bits) which leads to a similar .47 compression factor. If we massage > digit strings to 4-bit groups (by subtracting the '0') we find only > about 6% 1's and therefore the compression factor would be .1 (i.e. a 1Mb > file of random digits -> 100Kb). > > Again, if there are any glaring errors, please comment. > Glaring error #1: if maximum compression of a binary string is 2p(1-p), what is maximum compression of the binary data... 01010101010101010101010101010101 Maximum compression is defined by entropy of the file, or in other words, given a file, how many bits is required to determine uniquely the next bit of information. What is hoped for is that the number of bits is less than 1 for the files you want. Dr. Dobbs had some articles that talked about compression in #173, Feb. 1991. I can give you some of the references... Huffman, D.A. "A Method for the Construction of Minimum Redundancy Codes." _Proceeding of Institute of Electrical and Radio Engineers_ 40 (9) 1098-1101 September, 1952 Shannon, C.E. & W. Weaver _The Mathematical Theory of Communication_ Urbana, Illinois: Univ. of Illinois Press, 1949 Lelewer, Debra A. and Hirschberg, Daniel S. "Data Compression." _ACM Computing Surveys_, vol. 19, no. 3 New York, September, 1987 For more, I suggest you look at the Dr. Dobbs Journal I mentioned. -- Trent Tobler - ttobler@csulx.weber.edu