Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!ginosko!uunet!ncrlnk!ncr-sd!hp-sdd!ucsdhub!sdcsvax!network.ucsd.edu!net1!rmyers From: rmyers@net1.ucsd.edu (Robert Myers) Newsgroups: comp.sys.ibm.pc Subject: Re: Wanted:fast text compression source Summary: Info on Bookmaster text compression Keywords: compression,text Message-ID: <1961@network.ucsd.edu> Date: 11 Sep 89 15:24:51 GMT References: <13511@well.UUCP> <1958@network.ucsd.edu> <1656@mtunb.ATT.COM> Sender: nobody@network.ucsd.edu Reply-To: thane@sdnp1.ucsd.edu (Thane Plummer) Distribution: comp Organization: UCSD Network Operations Group Lines: 53 In article <1656@mtunb.ATT.COM> dmt@mtunb.UUCP (Dave Tutelman) writes: >In article <1958@network.ucsd.edu> rmyers@net1.UUCP (Robert Myers) writes: >>In article <13511@well.UUCP> alcmist@well.UUCP (Frederick Wamsley) writes: >>>I'm looking for text compression code which will be used to compress blocks >>> ... >>Contact Bookmaster Corp. in Telluride, CO... >>As I recall, they have a generic cruncher that will compress, index, >>and create a dictionary on text files with the performance you >>require (30K in 10sec on AT). > >I may be missing something, but what's the difference between this >and the "archivers" that we all use like PKZIP, zoo, ARC (careful..) >etc? > - Function (sounds very similar)? > - Speed? > - Compression ratio? > Sorry for the lack of more specific information on the original posting. Essentially, Bookmaster's program (not sure of the name) will take a text file and *compress* it. This process involves creating two files; one consists of a word dictionary and index, and the other is the original text coded into one and two byte tokens that correspond to the dictionary. This process takes about 10 sec or so to compress a small (30K) text file on an AT. FUNCTION: The function of this program in NOT for archival purposes. Consider that it makes two files out of one, not one file out of many. Its function is for 1) text compression, 2) ultra fast decompression of text, and 3) high speed searching. As I said before, this was part of a program for searching legal depositions. SPEED: The initial compression is what takes the longest. To decompress the text, the dictionary file must be loaded into RAM and the tokens are decoded on the fly. This allows you to decode the words as they are read from the text token file. I'm not sure of any exact specifications on decompression speed, but I believe that it is in the area of a few thousand words per second. COMPRESSION RATIO: As always, this varys with the file. Compression ratios range from 50% (worst) to 20% (best) of the size of the original file. Normally, (for a 50 page deposition!) the compressed text AND dictionary (together - not separatedly) are one third (1/3) the size of the original file. I know they used this method to compress the Bible. The size of the total text plus the dictionary is appx. 1.2 megs. Compressed text = 1.03 megs, dictionary = 190K. I believe the uncompressed text is aound 4 to 6 megabytes. This is considerably better than an any of the archivers I've seen, plus it can be decompressed very quickly. Hope this has been of help. T.K. Plummer