Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!ogicse!cvedc!mcspdx!adpplz!martin From: martin@adpplz.UUCP (Martin Golding) Newsgroups: comp.lang.c Subject: Re: hash function Message-ID: <843@adpplz.UUCP> Date: 26 Jun 91 20:11:34 GMT References: <14491.28619071@stjhmc.fidonet.org> <14583@dog.ee.lbl.gov> <29997.Jun2503.57.3491@kramden.acf.nyu.edu> <15047@ulysses.att.com> Organization: ADP Dealer Services R&D, Portland, OR Lines: 35 In <15047@ulysses.att.com> kpv@ulysses.att.com (Phong Vo[drew]) writes: >For everyone's entertainment, I did a simple (and admittedly contrived) test >of hashing 1000000 numbers starting at 1000000 into 1000000 buckets. >This is done by treating each number which is stored in a word as a >string of 4 bytes. With 261, 927300 of these numbers are >in their own buckets, the rest are in 36350 buckets each with 2 elements. >This is almost as good as a perfect hash function. The result for 33 is >much worse. The largest bucket has 64 elements in it and there are 4050 >such buckets. The key difference is the 8th bit in 261 which causes the >mixing to go a little faster. Your test is probably irrelevant, being contiguous sequential numbers. The best hash in this case is for( hash = (i = 0); i < len(numstr); hash = hash*10 + numstring[i++]); hash = hash % numberofbuckets; This will yield the best possible distribution for any number of hash buckets. The proof is left to the reader. The point is not that there is a better hash for sequential numbers, but that the performance of _any_ hash function is highly dependent on your keys; if you any idea of the values or kinds of values to expect you can choose a better function. If you can _control_ the values of your keys you can sometimes get a _much_ better function; viz the numbers above. If all this is old and boring stuff, I apologise for interrupting you all just to look like an idiot (flame me by email), and I now return you to your regularly scheduled newsgroup. Martin Golding | sync, sync, sync, sank ... sunk: Dod #0236 | He who steals my code steals trash. A poor old decrepit Pick programmer. Sympathize at: {mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp