Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!ginosko!uunet!ncrlnk!ncr-sd!hp-sdd!ucsdhub!sdcsvax!network.ucsd.edu!net1!rmyers
From: rmyers@net1.ucsd.edu (Robert Myers)
Newsgroups: comp.sys.ibm.pc
Subject: Re: Wanted:fast text compression source
Summary: Info on Bookmaster text compression
Keywords: compression,text
Message-ID: <1961@network.ucsd.edu>
Date: 11 Sep 89 15:24:51 GMT
References: <13511@well.UUCP> <1958@network.ucsd.edu> <1656@mtunb.ATT.COM>
Sender: nobody@network.ucsd.edu
Reply-To: thane@sdnp1.ucsd.edu (Thane Plummer)
Distribution: comp
Organization: UCSD Network Operations Group
Lines: 53

In article <1656@mtunb.ATT.COM> dmt@mtunb.UUCP (Dave Tutelman) writes:
>In article <1958@network.ucsd.edu> rmyers@net1.UUCP (Robert Myers) writes:
>>In article <13511@well.UUCP> alcmist@well.UUCP (Frederick Wamsley) writes:
>>>I'm looking for text compression code which will be used to compress blocks
>>> ...
>>Contact Bookmaster Corp. in Telluride, CO...
>>As I recall, they have a generic cruncher that will compress, index,
>>and create a dictionary on text files with the performance you
>>require (30K in 10sec on AT).
>
>I may be missing something, but what's the difference between this
>and the "archivers" that we all use like PKZIP, zoo, ARC (careful..)
>etc?
>   -	Function (sounds very similar)?
>   -	Speed?
>   -	Compression ratio?
>

Sorry for the lack of more specific information on the original
posting.  Essentially, Bookmaster's program (not sure of the name)
will take a text file and *compress* it.  This process involves
creating two files; one consists of a word dictionary and index, and
the other is the original text coded into one and two byte tokens
that correspond to the dictionary.  This process takes about 10 sec
or so to compress a small (30K) text file on an AT.

	FUNCTION:  The function of this program in NOT for archival
purposes.  Consider that it makes two files out of one, not one file
out of many.  Its function is for 1) text compression, 2) ultra fast
decompression of text, and 3) high speed searching.  As I said
before, this was part of a program for searching legal depositions.

	SPEED:  The initial compression is what takes the longest.
To decompress the text, the dictionary file must be loaded into RAM
and the tokens are decoded on the fly.  This allows you to decode
the words as they are read from the text token file.  I'm not sure
of any exact specifications on decompression speed, but I believe
that it is in the area of a few thousand words per second.

	COMPRESSION RATIO:  As always, this varys with the file.
Compression ratios range from 50% (worst) to 20% (best) of the size of
the original file.  Normally, (for a 50 page deposition!) the
compressed text AND dictionary (together - not separatedly)  are one
third (1/3) the size of the original file.  I know they used this
method to compress the Bible.  The size of the total text plus the
dictionary is appx. 1.2 megs.  Compressed text = 1.03 megs,
dictionary = 190K.  I believe the uncompressed text is aound 4 to 6
megabytes.  This is considerably better than an any of the
archivers I've seen, plus it can be decompressed very quickly.

Hope this has been of help.

T.K. Plummer