Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!caen!sdd.hp.com!cs.utexas.edu!radar!cadillac!Pkg.Mcc.COM!steve
From: steve@Pkg.Mcc.COM (Steve Madere)
Newsgroups: comp.databases
Subject: Re: In Search Of a key compression algorithm!
Keywords: Non-lexigraphically destructive.
Message-ID: <1991May31.141508@Pkg.Mcc.COM>
Date: 31 May 91 19:15:08 GMT
References: <35014@mimsy.umd.edu>
Sender: news@cadillac.CAD.MCC.COM
Reply-To: steve@Pkg.Mcc.COM (Steve Madere)
Lines: 51

In article <35014@mimsy.umd.edu>, jerryw@tove.cs.umd.edu (Jerry Wieber) writes:
| I am looking for an algorithm that can compress keys without destroying
| lexigraphic ordering.  A reply to the effect of "this is impossible" is
| equally helpful, of course.  Any compression, no matter how small, may
| be useful.  To wit, I have a very large number of variable length keys
| from 1 to 64 bytes in length, and I have got to get the size down for
| sorting....
| 
| 
| All replies gratefully appreciated!
| -Jerry
| --
|                                                             __________ 
|  UUCP:     uunet!cs.umd.edu!jerryw   SPOKEN: Jerry Wieber   |/ `-.   
|  U of Md
|  INTERNET: jerryw@cs.umd.edu              "Disclaimer"           
\_|.|-,  
|                                                                      `
- 

If you have a truly large set, you will probably have lots of sets
of adjacent keys in which they share alot of common leading symbols.

American Airlines
American Antiques
American Antlers
Amerigo Vespucci

etc.

What you can do is to devote the first byte of each key to count
the number of leading characters in common with the previous key
so that the above list would become.

American Airlines
'\0x0a'ntiques
'\0x0c'lers
'\0x05'go Vespucci

This saves 24 chars out of 40.

This can only be used to store them in the final list however.
During sorting they will obviously have to be represented in their
full sized form.

I can't think of a good search technique through such a sorted
list for the moment but I'm sure you can if you spend a few moments
to think about it.

I realize this is a pretty braindead way to compact a key list
but it would in certain cases afford significant storage savings.