Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!samsung!uunet!mcsun!hp4nl!cwi.nl!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.lang.c Subject: Re: hash function Message-ID: <3769@charon.cwi.nl> Date: 26 Jun 91 13:25:56 GMT References: <29997.Jun2503.57.3491@kramden.acf.nyu.edu> <15047@ulysses.att.com> <17918.Jun2608.17.5091@kramden.acf.nyu.edu> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 30 In article <17918.Jun2608.17.5091@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > In article <15047@ulysses.att.com> kpv@ulysses.att.com (Phong Vo[drew]) writes: > > For everyone's entertainment, I did a simple (and admittedly contrived) test > > of hashing 1000000 numbers starting at 1000000 into 1000000 buckets. > > This is done by treating each number which is stored in a word as a > > string of 4 bytes. > > Sorry, but the mod has to be a power of 2. Why? (If you are telling me for speed I'll scream.) > > Surely you noticed that a multiplier of 256 will produce perfect hashing > in this case. Does that mean you're recommending a multiplier of 256? Only on big-endian machines of course :-) > > The profiling statistics I've saved from various compressors form a much > more realistic sample. They show 33 as slightly better than random > hashing on typical files, 37 as slightly worse. I've never tried 261 or > 31, but I'll bet 261 does worse than 33 on text files. > I did some simple tests, hashing all entries of /usr/dict/words, counting how many entries fall in each bucket. I did this with the four multipliers Dan just mentioned. I did this for 16 buckets to 8192 buckets (only powers of two done). I calculated also the standard deviations involved. I will not bore you with all those digits, the result can be stated simply: it does not matter very much what multiplier you use. For each multiplier there is a number of buckets where that multiplier has the smallest standard deviation. So use whatever you like. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl