Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!udel!haven.umd.edu!purdue!decwrl!ucbvax!ulysses!kpv From: kpv@ulysses.att.com (Phong Vo[drew]) Newsgroups: comp.lang.c Subject: Re: hash function Message-ID: <15047@ulysses.att.com> Date: 25 Jun 91 21:27:14 GMT References: <14491.28619071@stjhmc.fidonet.org> <14583@dog.ee.lbl.gov> <29997.Jun2503.57.3491@kramden.acf.nyu.edu> Organization: AT&T Bell Laboratories, Murray Hill Lines: 27 > Again, practically any good multiplier works. I think you're worrying > about the fact that 31c + d doesn't cover any reasonable range of hash > values if c and d are between 0 and 255. That's why, when I discovered > the 33 hash function and started using it in my compressors, I started > with a hash value of 5381. I think you'll find that this does just as > well as a 261 multiplier. > > ---Dan 261 is just a number that was picked out of a hat that satisfies the conditions outlined in the theorems in Knuth. These conditions are pretty minimal so yes there are lots and lots of numbers that would do just as well. For compressors, if you work with binary files, it isn't infrequent to have strings of 0's. In such a case, you'll want the multiplier to have the nice properties of a RNG. For this reason, 261 type of numbers may be a little better than 33 (in either case, don`t forget to start the hash sum at some odd value!). For everyone's entertainment, I did a simple (and admittedly contrived) test of hashing 1000000 numbers starting at 1000000 into 1000000 buckets. This is done by treating each number which is stored in a word as a string of 4 bytes. With 261, 927300 of these numbers are in their own buckets, the rest are in 36350 buckets each with 2 elements. This is almost as good as a perfect hash function. The result for 33 is much worse. The largest bucket has 64 elements in it and there are 4050 such buckets. The key difference is the 8th bit in 261 which causes the mixing to go a little faster.