Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!lll-lcc!styx!ames!necntc!linus!philabs!micomvax!musocs!mcgill-vision!mouse From: mouse@mcgill-vision.UUCP (der Mouse) Newsgroups: sci.crypt,comp.sys.ibm.pc,comp.sources.wanted Subject: Re: Need information on data compression algorithms Message-ID: <743@mcgill-vision.UUCP> Date: Sun, 26-Apr-87 08:44:40 EDT Article-I.D.: mcgill-v.743 Posted: Sun Apr 26 08:44:40 1987 Date-Received: Sun, 3-May-87 18:39:20 EDT References: <528@savax.UUCP> <635@ttidca.UUCP> <4542@columbia.UUCP> <1323@ihdev.ATT.COM> Organization: McGill University, Montreal Lines: 31 Xref: mnetor sci.crypt:359 comp.sys.ibm.pc:3718 comp.sources.wanted:1072 In article <1323@ihdev.ATT.COM>, pdg@ihdev.ATT.COM (Joe Isuzu) writes: > In article <4542@columbia.UUCP> metzger@garfield.columbia.edu.UUCP (Perry Metzger) writes: >> Look folks, as we all know from being cryptographers [...] that the >> redundancy of english is NEVER estimated [...] at more than 75%, so >> thus it is impossible to compress it beyond that in general. > Unless you take symbolic representation into account. Say you > numbered the (2^14)-1 words, and used that to replace just the > absolute matches (forgetting about endings etc), What you appear to be suggesting is not just compressing English text but representing it at a higher level. If you "compress" beyond the point of discarding all redundancy you are losing information; in other words, there will be distinct input texts which compress to the same compressed text, which means it will be impossible to correctly regenerate both of them from the compressed text. For example, in English there are often two (or more) ways to represent the same concept. We are not interested in merely storing ideas (or if we are, we should be explicit in admitting it), we want to compress text and then uncompress to the exact same sequence of bytes. Alternatively, your posting could be taken to mean that you believe there is more than 75% redundancy in English. In the case of Usenet postings I fear you are correct - maybe your software would improve netnews performance! -- what's that? You don't have this implemented? Then how do you know it works? Go ahead and try it; I suspect you will find it doesn't compress by as much as you think it will. der Mouse (mouse@mcgill-vision.uucp)