Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site peora.UUCP Path: utzoo!linus!philabs!cmcl2!seismo!harvard!talcott!panda!genrad!decvax!tektronix!uw-beaver!cornell!vax135!petsd!peora!joel From: joel@peora.UUCP (Joel Upchurch) Newsgroups: net.math,net.micro Subject: Re: Data compression and information theory Message-ID: <1398@peora.UUCP> Date: Wed, 31-Jul-85 09:58:03 EDT Article-I.D.: peora.1398 Posted: Wed Jul 31 09:58:03 1985 Date-Received: Sun, 4-Aug-85 04:48:48 EDT References: <417@lasspvax.UUCP> Organization: Perkin-Elmer SDC, Orlando, Fl. Lines: 26 Xref: linus net.math:1789 net.micro:10038 >What can people tell me about data compression or encoding schemes other >than HUffman encoding? One way to do this is to build a dictionary of 50,000 or so words, then you can compress the text by representing each word as a half-word which stands for the nth word in the dictionary. You can also achieve more compression by putting frequently used phrases in the dictionary too, but that might increase compression time. It you assume that the average word has 6 characters that gives you between 2 and 3 bits per character. I think I read about this idea a long time ago in one of Wayne Green's editorials in Kilobaud. If you have a machine with enough memory to keep the dictionary in memory, then compression and decompression of the text should be very rapid. I read an announcement from, I think, Sperry, that was for a machine that could do extremely rapid text retievals and key-word searches. From the description I think they may be using a scheme similar to this. As you can see, key-word searches of the compressed text would be extremely rapid with this scheme. This approach wouldn't be very useful for sending data over serial lines unless both ends had the same dictionary.