Path: utzoo!attcan!uunet!cs.utexas.edu!news-server.csri.toronto.edu!helios.physics.utoronto.ca!ists!yunexus!oz From: oz@yunexus.yorku.ca (Ozan Yigit) Newsgroups: comp.lang.c Subject: Re: Hash functions in C Message-ID: <17394@yunexus.YorkU.CA> Date: 12 Nov 90 17:26:28 GMT References: <14442@goofy.megatest.UUCP> <14461@goofy.megatest.UUCP> Sender: news@yunexus.YorkU.CA Organization: York U. Communications Research & Development Lines: 70 In article <14461@goofy.megatest.UUCP> djones@megatest.UUCP [Dave Jones] writes: >I just now tested some hash-functions for an application which is >probably more typical, namely a compiler. The simplest function is >still a big winner on all counts. Dave, you may also wish to take a look at the following. %A B. J. McKenzie %A R. Harries %A T. Bell %T Selecting a Hashing Algorithm %J Software - Practice and Experience %V 20 %N 2 %D Feb 1990 %P 209-224 This interesting article evaluates some of the more commonly known hash functions from a few sources: Amsterdam Compiler Kit, Eth Modula-2 Cross Compiler, Gnu-cpp, Gnu-cc1, pcc (4.3bsd), C++ (AT&T) and Icon. Their conclusions suggest that the algorithms of the style h(0) = 0; h(i) = k * h(i-1) + ch(i) [for 1 <= i <= n] hash = h(n) MOD N perform well provided k (a constant multiplier) and N (the table size) are chosen well. They feel that most implementation of hash functions contain far-from-optimal choices [so, what else is new?]. They also state: The experiments have shown that very small variations in N can produce large variations in the efficiency of the hash-table look-up, and that the popular view, that choice of a prime number will automatically ensure a good result, is not well founded. [pp 224] They found that one of the simplest hashing functions (gnu-cpp) performed well, both in speed and number of probes: [included verbatim from gnu-cpp. k = 4, N = 1403 odd but not prime] #define HASHSIZE 1403 #define HASHSTEP(old, c) ((old << 2) + c) #define MAKE_POS(v) (v & ~0x80000000) /* make number positive */ hashf (name, len, hashsize) register U_CHAR *name; register int len; int hashsize; { register int r = 0; while (len--) r = HASHSTEP (r, *name++); return MAKE_POS (r) % hashsize; } [ps: I have not yet tested this function, so I do not know if their results are easily reproducible. For example, would making r unsigned, and avoiding MAKE_POS crock slow this function down?] enjoy.. oz --- First learn your horn and all the theory. Internet: oz@nexus.yorku.ca Next develop a style. Then forget all that uucp: utzoo/utai!yunexus!oz and just play. Charlie Parker [?] York U. CCS: (416) 736 5257