Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!elroy.jpl.nasa.gov!ncar!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstnd
From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: comp.lang.c
Subject: Re: hash function
Message-ID: <29997.Jun2503.57.3491@kramden.acf.nyu.edu>
Date: 25 Jun 91 03:57:34 GMT
References: <14491.28619071@stjhmc.fidonet.org> <14583@dog.ee.lbl.gov> <15027@ulysses.att.com>
Organization: IR
Lines: 27

In article <15027@ulysses.att.com> kpv@ulysses.att.com (Phong Vo[drew]) writes:
> A good property for a hash function is if you keep feeding it
> a constant character, it performs as if it's a random number generator.

No, I don't think so. This is commonly true of good hash functions and
commonly false of bad ones, but keep in mind that pseudo-random
sequences have to keep working 100, 1000, 1000000 elements later, while
simple string hash functions are rarely applied to strings with more
than 80 characters.

  [ as a generator mod 2^n, 37 is better than 33 is better than 31 ]

Again, I don't think this is accurate. Each number has a sufficiently
long period mod 2^n that you don't have to worry about repeats.

> We typically hash byte strings and we want
> the bits to mix in the 16-bit or 32-bit registers relatively fast.
  [ so use a multipler of 261 ]

Again, practically any good multiplier works. I think you're worrying
about the fact that 31c + d doesn't cover any reasonable range of hash
values if c and d are between 0 and 255. That's why, when I discovered
the 33 hash function and started using it in my compressors, I started
with a hash value of 5381. I think you'll find that this does just as
well as a 261 multiplier.

---Dan