Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!zaphod.mps.ohio-state.edu!mips!atha!aunro!alberta!ubc-cs!unixg.ubc.ca!cheddar.ucs.ubc.ca!buckland From: buckland@cheddar.ucs.ubc.ca (Tony Buckland) Newsgroups: comp.databases Subject: Re: In Search Of a key compression algorithm! Keywords: Non-lexigraphically destructive. Message-ID: <1991May29.211100.12909@unixg.ubc.ca> Date: 29 May 91 21:11:00 GMT References: <35014@mimsy.umd.edu> Sender: news@unixg.ubc.ca (Usenet News Maintenance) Organization: Computing Services, University of British Columbia Lines: 16 Nntp-Posting-Host: cheddar.ucs.ubc.ca In article <35014@mimsy.umd.edu> jerryw@tove.cs.umd.edu (Jerry Wieber) writes: >I am looking for an algorithm that can compress keys without destroying >lexigraphic ordering. A reply to the effect of "this is impossible" is >equally helpful, of course. Any compression, no matter how small, may >be useful. ... If you can specify that only a fixed, small character set is used in the keys, usually the case since keys tend to be alphabetic plus a little punctuation, translate each key byte to a 1-byte integer representing the ordering of the characters you use (e.g. 00 = blank, 01 = a, ...), figure out the minimum number of bits you need to express those integers (5 might do it if you want to disregard case, so that a=A, b=B, ...), and compress n bytes of key to n*number of bits. This will get you 3/8 or possibly 2/8 shrinkage in the key length, with no loss of ordering.