Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site peora.UUCP
Path: utzoo!linus!philabs!cmcl2!seismo!harvard!talcott!panda!genrad!decvax!tektronix!uw-beaver!cornell!vax135!petsd!peora!joel
From: joel@peora.UUCP (Joel Upchurch)
Newsgroups: net.math,net.micro
Subject: Re: Data compression and information theory
Message-ID: <1398@peora.UUCP>
Date: Wed, 31-Jul-85 09:58:03 EDT
Article-I.D.: peora.1398
Posted: Wed Jul 31 09:58:03 1985
Date-Received: Sun, 4-Aug-85 04:48:48 EDT
References: <417@lasspvax.UUCP>
Organization: Perkin-Elmer SDC, Orlando, Fl.
Lines: 26
Xref: linus net.math:1789 net.micro:10038

>What can people tell me about data compression or encoding schemes other
>than HUffman encoding?

One way to do this is to build a dictionary of 50,000 or so words,
then you can compress the text by representing each word as a half-word
which stands for the nth word in the dictionary. You can also achieve
more compression by putting frequently used phrases in the dictionary
too, but that might increase compression time. It you assume that the
average word has 6 characters that gives you between 2 and 3 bits per
character.

I think I read about this idea a long time ago in one of Wayne Green's
editorials in Kilobaud.

If you have a machine with enough memory to keep the dictionary in
memory, then compression and decompression of the text should be
very rapid.

I read an announcement from, I think, Sperry, that was for a machine
that could do extremely rapid text retievals and key-word searches.
From the description I think they may be using a scheme similar to
this. As you can see, key-word searches of the compressed text would
be extremely rapid with this scheme.

This approach wouldn't be very useful for sending data over serial
lines unless both ends had the same dictionary.