Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!elroy.jpl.nasa.gov!turnkey!orchard.la.locus.com!fafnir.la.locus.com!cjkuo
From: cjkuo@locus.com (Chengi Jimmy Kuo)
Newsgroups: comp.compression
Subject: Re: fast string compr.?
Message-ID: <1991May25.235703.159943@locus.com>
Date: 25 May 91 23:57:03 GMT
References: <LACA.91May20141838@vipunen.hut.fi>
Distribution: comp
Organization: Locus Computing Corporation, Los Angeles, California
Lines: 27

laca@vipunen.hut.fi (Laszlo C Balint) writes:

>I am looking for an algorithm to compress and decompress strings of
>variable length. The compressed sequence can be fixed length too, but
>the process should be effective and reversable. I need it for a spell 
>checker, where the number of words might be huge, but all they have 
>to fit in the memory becuase of a frequent access (~20 lookups/checked
>word).

>Any help appreciated, especially if a C(++) code :-)

Sorry, no C++, but...

Compress the dictionary but having a table for the first 2 or 3 letters
(26x26, or 26x26x26, but actually much less since there are no qbX, etc.).
For the words, once you're passed the first 3 letters, start compressing by
using "repeat deltas".  A word is composed of N letters that repeat from the
previous word, followed by the delta.  You can compress the delta portion
using whatever method you want since it's straight alphabet letters (but I
choose not to).  Compression may not be the best, but data access is pretty
fast since decompression is trivial (also, strcmp is partially completed
due to this encoding).

Jimmy Kuo
-- 
cjkuo@locus.com
"The correct answer to an either/or question is both!"