Path: utzoo!utgpu!news-server.csri.toronto.edu!neuron.ai.toronto.edu!radford Newsgroups: comp.databases From: radford@ai.toronto.edu (Radford Neal) Subject: Re: In Search Of a key compression algorithm! (redirected) Message-ID: <91Jun3.004500edt.128@neuron.ai.toronto.edu> Organization: Department of Computer Science, University of Toronto References: Distribution: comp Date: 3 Jun 91 04:45:21 GMT Lines: 25 >From: jerryw@tove.cs.umd.edu (Jerry Wieber) >Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 > >I am looking for an algorithm that can compress keys without destroying >lexigraphic ordering. A reply to the effect of "this is impossible" is >equally helpful, of course. Any compression, no matter how small, may >be useful. To wit, I have a very large number of variable length keys >from 1 to 64 bytes in length, and I have got to get the size down for >sorting.... Arithmetic coding should work fine for this. Presumably, one would use a fixed model, but one might want to pay attention to context. All that is required to preserve lexigraphic order is that the mapping from letters to numeric regions preserve the alphabetic ordering. This is easily arrainged, perhaps at a slight computational cost. Assuming that these keys are something like peoples' names, compression down to about 4 bits per letter should be readily achievable. I would expect that compression to less than 3 bits per letter might well be possible. Radford Neal