Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!att!linac!pacific.mps.ohio-state.edu!cis.ohio-state.edu!zaphod.mps.ohio-state.edu!samsung!olivea!mintaka!bloom-beacon!eru!hagbard!sunic!mcsun!hp4nl!phigate!prle!prles2!prl.philips.nl!rogier
From: rogier@prl.philips.nl
Newsgroups: comp.compression
Subject: compress text once, decompress many
Message-ID: <2700@prles2.prl.philips.nl>
Date: 12 Apr 91 07:28:39 GMT
Sender: news@prles2.prl.philips.nl
Organization: Philips Research Laboratories Eindhoven, the Netherlands
Lines: 32


I am looking for a compression scheme which compresses static text,
compress once decompress many.

One way of doing it is prefix omission, like changing text into pointers
to a dictionary which is stored as in the following example:

text entry     prefix length       stored suffix
form                 0                 form
formally             4                 ally
format               4                 t

The suffices can be compressed by Huffman coding.

Ref: "Compression of Concordances in Full-Text Retrieval Systems"
     Y. Choueka et.al., ACM SIGIR, 11-th conf. on research & development
     in Information Retrieval, June 1988, Grenoble-France.


Another way of doing it is finding the number of occurrences of all
possible sub-strings in the text. Then have a good heuristic to pick
the sub-strings you are going to put in a dictionary.

Questions:
Does anyone know good heuristics for this,
does anyone know other solutions or references?


--------------------------------------------------
Rogier Wester
Philips Research Laboratories, The Netherlands.
e-mail: rogier@prle.prl.philips.nl