Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!bu.edu!m2c!jjmhome!smds!rh From: rh@smds.UUCP (Richard Harter) Newsgroups: comp.lang.c Subject: Re: hash function for mac needed Summary: Perfection is indeed a costly virtue Message-ID: <567@smds.UUCP> Date: 19 Jun 91 04:38:20 GMT References: <14207.285B768A@stjhmc.fidonet.org> <14396@dog.ee.lbl.gov> Organization: SMDS Inc., Concord, MA Lines: 47 In article <14396@dog.ee.lbl.gov>, torek@elf.ee.lbl.gov (Chris Torek) writes: > Indeed: as it turns out, while many people have spent much time looking > for `good' hashing functions, fewer seem to have noted that many > hashing systems spend more time computing hashes than searching. I > find that this is usually the case, and that a fast hash function is > better than a slower-but-better-distribution function. One good hash > function, useful in many applications, is this: > hash = 0; > for (each character) > hash = (hash * 33) + (the character); > hash_index = hash & ((some power of two) - 1); And it may well pay to be even more radical in trading speed for 'perfection'. In one of my programs I found it quite profitable to only look at the last ten characters of a string, summing all but the last two, and doing shift and sum on the last two. E.g. if n is the string length hash = 0; cc = string + n; switch (n>10 ? 10 : n) { case 10: hash += (int)*--cc; case 9: hash += (int)*--cc; case 8: hash += (int)*--cc; case 7: hash += (int)*--cc; case 6: hash += (int)*--cc; case 5: hash += (int)*--cc; case 4: hash += (int)*--cc; case 3: hash += (int)*--cc; case 2: hash += ((int)*--cc)<<1; case 1: hash += ((int)*--cc)<<3; default: break; } hash &= 1023; Admittedly this appears crude, but in practice the last ten bytes were all that were really needed to distinguish most strings (in the application of interest, of course.) Nor was it really needful to do anything more than summing bytes. I should note that the string length was stored in the table; in effect it served as a secondary key for the bucket search loop. The speed up was quite sharp. -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.