Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!elroy.jpl.nasa.gov!ncar!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.lang.c Subject: Re: hash function Message-ID: <29997.Jun2503.57.3491@kramden.acf.nyu.edu> Date: 25 Jun 91 03:57:34 GMT References: <14491.28619071@stjhmc.fidonet.org> <14583@dog.ee.lbl.gov> <15027@ulysses.att.com> Organization: IR Lines: 27 In article <15027@ulysses.att.com> kpv@ulysses.att.com (Phong Vo[drew]) writes: > A good property for a hash function is if you keep feeding it > a constant character, it performs as if it's a random number generator. No, I don't think so. This is commonly true of good hash functions and commonly false of bad ones, but keep in mind that pseudo-random sequences have to keep working 100, 1000, 1000000 elements later, while simple string hash functions are rarely applied to strings with more than 80 characters. [ as a generator mod 2^n, 37 is better than 33 is better than 31 ] Again, I don't think this is accurate. Each number has a sufficiently long period mod 2^n that you don't have to worry about repeats. > We typically hash byte strings and we want > the bits to mix in the 16-bit or 32-bit registers relatively fast. [ so use a multipler of 261 ] Again, practically any good multiplier works. I think you're worrying about the fact that 31c + d doesn't cover any reasonable range of hash values if c and d are between 0 and 255. That's why, when I discovered the 33 hash function and started using it in my compressors, I started with a hash value of 5381. I think you'll find that this does just as well as a 261 multiplier. ---Dan