Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!nrl-cmf!ukma!gatech!hubcap!ncrcae!ncrlnk!uunet!mcvax!ukc!cs.tcd.ie!tcdmath!mwebb From: mwebb@maths.tcd.ie (Mark Webb) Newsgroups: comp.sys.ibm.pc Subject: Data Tokenisation Message-ID: <179@maths.tcd.ie> Date: 9 Dec 88 23:38:36 GMT Organization: Maths Dept., Trinity College, Dublin Lines: 16 I'm looking for information about tokenisation of data in a database where space is at a premium. Isn't it always! I have a data file (about 200k) mainly consisting of names and addresses which needs to be stored (and retreived) more efficiently. (e.g. street names could be stored as a single token) The tokenisation process (analysis, storage etc) need not necessarily be fast, but the de-tokenisation process must be rapid (probably by means of a `look-up' table of some sort). If anyone could send me some references or better still some code (C, ASM, Pascal), I would be most grateful. -- -Mark Webb mwebb@maths.tcd.ie