Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!caen!sdd.hp.com!cs.utexas.edu!radar!cadillac!Pkg.Mcc.COM!steve From: steve@Pkg.Mcc.COM (Steve Madere) Newsgroups: comp.databases Subject: Re: In Search Of a key compression algorithm! Keywords: Non-lexigraphically destructive. Message-ID: <1991May31.141508@Pkg.Mcc.COM> Date: 31 May 91 19:15:08 GMT References: <35014@mimsy.umd.edu> Sender: news@cadillac.CAD.MCC.COM Reply-To: steve@Pkg.Mcc.COM (Steve Madere) Lines: 51 In article <35014@mimsy.umd.edu>, jerryw@tove.cs.umd.edu (Jerry Wieber) writes: | I am looking for an algorithm that can compress keys without destroying | lexigraphic ordering. A reply to the effect of "this is impossible" is | equally helpful, of course. Any compression, no matter how small, may | be useful. To wit, I have a very large number of variable length keys | from 1 to 64 bytes in length, and I have got to get the size down for | sorting.... | | | All replies gratefully appreciated! | -Jerry | -- | __________ | UUCP: uunet!cs.umd.edu!jerryw SPOKEN: Jerry Wieber |/ `-. | U of Md | INTERNET: jerryw@cs.umd.edu "Disclaimer" \_|.|-, | ` - If you have a truly large set, you will probably have lots of sets of adjacent keys in which they share alot of common leading symbols. American Airlines American Antiques American Antlers Amerigo Vespucci etc. What you can do is to devote the first byte of each key to count the number of leading characters in common with the previous key so that the above list would become. American Airlines '\0x0a'ntiques '\0x0c'lers '\0x05'go Vespucci This saves 24 chars out of 40. This can only be used to store them in the final list however. During sorting they will obviously have to be represented in their full sized form. I can't think of a good search technique through such a sorted list for the moment but I'm sure you can if you spend a few moments to think about it. I realize this is a pretty braindead way to compact a key list but it would in certain cases afford significant storage savings.