Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!bbn.com!papaya.bbn.com!rsalz From: rsalz@bbn.com (Rich Salz) Newsgroups: comp.lang.c Subject: Re: hash function for mac needed Message-ID: <3634@litchi.bbn.com> Date: 21 Jun 91 17:35:04 GMT References: <14207.285B768A@stjhmc.fidonet.org> <14396@dog.ee.lbl.gov> <567@smds.UUCP> <1991Jun19.161959.13677@zoo.toronto.edu> Organization: BBN Systems and Technology, Inc. Lines: 21 In <1991Jun19.161959.13677@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: |In fact, when I originally did the hashed newsgroup-lookup code in C News |expire, a quick investigation did not reveal any hashing function which |had overall performance much better than simply using the length of the |string! (The newsgroup list has grown so much that this decision is in |need of revision, mind you.) I think you'll want to re-check. The distributions are real uneven -- there are more than 40 newsgroups of length 10, for example, so you have to resolve lots of conflicts, and if you don't put a secondary hash, that means lots of strcmp's. Of course, the design is very dependant on the newsgroup that a site receives and the distribution of articles within those sites. I use Chris Torek's hash function (amazingly good distribution and amazingly cheap to calculate) with a table size of 128 and sort the conflicts based on the highest article number. It seems to give the biggest win for the biggest number of systems. /r$ -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.