Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!elroy.jpl.nasa.gov!turnkey!orchard.la.locus.com!fafnir.la.locus.com!cjkuo From: cjkuo@locus.com (Chengi Jimmy Kuo) Newsgroups: comp.compression Subject: Re: fast string compr.? Message-ID: <1991May25.235703.159943@locus.com> Date: 25 May 91 23:57:03 GMT References: Distribution: comp Organization: Locus Computing Corporation, Los Angeles, California Lines: 27 laca@vipunen.hut.fi (Laszlo C Balint) writes: >I am looking for an algorithm to compress and decompress strings of >variable length. The compressed sequence can be fixed length too, but >the process should be effective and reversable. I need it for a spell >checker, where the number of words might be huge, but all they have >to fit in the memory becuase of a frequent access (~20 lookups/checked >word). >Any help appreciated, especially if a C(++) code :-) Sorry, no C++, but... Compress the dictionary but having a table for the first 2 or 3 letters (26x26, or 26x26x26, but actually much less since there are no qbX, etc.). For the words, once you're passed the first 3 letters, start compressing by using "repeat deltas". A word is composed of N letters that repeat from the previous word, followed by the delta. You can compress the delta portion using whatever method you want since it's straight alphabet letters (but I choose not to). Compression may not be the best, but data access is pretty fast since decompression is trivial (also, strcmp is partially completed due to this encoding). Jimmy Kuo -- cjkuo@locus.com "The correct answer to an either/or question is both!"