Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!think.com!spool.mu.edu!news.cs.indiana.edu!att!att!ulysses!kpv From: kpv@ulysses.att.com (Phong Vo[drew]) Newsgroups: comp.lang.c Subject: Re: hash function Summary: a little theory Message-ID: <15027@ulysses.att.com> Date: 23 Jun 91 19:03:19 GMT References: <14491.28619071@stjhmc.fidonet.org> <14583@dog.ee.lbl.gov> Organization: AT&T Bell Laboratories, Murray Hill Lines: 36 Somewhere in this thread, Chris Torek proposed as a hash function: > >> hash = 0; > >> for (each character) > >> hash = (hash * 33) + (the character); > >> hash_index = hash & ((some power of two) - 1); > Chris then asked, > ... What *does* that 33 do? > > Maybe there are some theory people out there who can explain it. > Probably some staring at Knuth vol. 3 would help. A good property for a hash function is if you keep feeding it a constant character, it performs as if it's a random number generator. The hash function f(h,c) = 33*h + c is derived from a class of random number generators called linear congruential. At the start of Knuth Vol 2, there is a good discussion of these generators. Let's apply some of the theorems in that section and see what we get. Keep in mind that we are working with random numbers mod 2^x where x is usually 16 or 32. Assuming that c is some fixed odd value, Theorem A guarantees that 33 is good because (33-1)%4 == 0. In the same reasoning, 31 is not good since (31-1)%4 == 2 != 0. Here good means that if you keep feeding c, you'll cycle thru the entire set of values [0,2^x). On the other hand, 33 is not a primitive element for 2^x when x > 3 (Theorem C), so it isn't a good multiplicator if you keep feeding in 0's. For example, 37 is a better value from that point of view. To summarize, a good multiplicator is any number with 101 as their low order 3 bits. At this point, we depart from theory. We typically hash byte strings and we want the bits to mix in the 16-bit or 32-bit registers relatively fast. This suggests that some value larger than 2^8 (but not too large) may do better as a multiplicator. After a little experimentation, here's a pretty decent hash function: hash = 0; for(each byte c) hash = (hash<<8) + (hash<<2) + hash + c; This corresponds to the multiplicator 261 whose bit pattern is 100000101. Try it, you'll like it.