Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!nrl-cmf!ukma!gatech!hubcap!ncrcae!ncrlnk!uunet!mcvax!ukc!cs.tcd.ie!tcdmath!mwebb
From: mwebb@maths.tcd.ie (Mark Webb)
Newsgroups: comp.sys.ibm.pc
Subject: Data Tokenisation
Message-ID: <179@maths.tcd.ie>
Date: 9 Dec 88 23:38:36 GMT
Organization: Maths Dept., Trinity College, Dublin
Lines: 16


I'm looking for information about tokenisation of data
in a database where space is at a premium. Isn't it always!

I have a data file (about 200k) mainly consisting of names and addresses
which needs to be stored (and retreived) more efficiently.
(e.g. street names could be stored as a single token)

The tokenisation process (analysis, storage etc) need not necessarily
be fast, but the de-tokenisation process must be rapid
(probably by means of a `look-up' table of some sort).

If anyone could send me some references or better still some code
(C, ASM, Pascal), I would be most grateful.
-- 
-Mark Webb			mwebb@maths.tcd.ie