Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site csd1.UUCP Path: utzoo!linus!decvax!harpo!floyd!cmcl2!csd1!condict From: condict@csd1.UUCP (Michael Condict) Newsgroups: net.lang Subject: Re: hash function generator - (nf) Message-ID: <121@csd1.UUCP> Date: Tue, 4-Oct-83 17:10:54 EDT Article-I.D.: csd1.121 Posted: Tue Oct 4 17:10:54 1983 Date-Received: Fri, 7-Oct-83 00:59:03 EDT References: <1984@hp-pcd.UUCP> Organization: New York University Lines: 39 Yes, there is are several ways to generate such hash functions. One such method was reported in Communications a few years back (1979?) in an article by my old friend Richard Cichelli (sorry, Richard, you're not that old). If he is on the net I am sure he will be happy to respond to your request. Meanwhile I'll tell you what I can remember. The assumption is slightly different from what you stated in that we allow the hash function to produce any output it wants (between 1 and n), with the objective, of course, of minimizing collisions and maximizing density of the hash table. So the input to the generator is a list of character strings and an allowed range 1..n, rather than a list of string/number pairs. A hash function is said to be perfect if it produces no collisions over the entire set of allowed inputs. It is said to be minimal if it produces a completely dense hash table, that is, if its set of outputs form a contiguous run of numbers. As I recall, Cichelli's method would produce a minimal perfect hash function in a quite reasonable amount of time (a few seconds or minutes to get one for the 32 Pascal reserved words, using a PDP/11). The form of the hash function was very simple. It consisted of h(str) = code1[str[1]] + code2[str[strleng]] + strleng; where str is the array of characters that is the input to the hash function, strleng is its length, and code1, code2 are two arrays that are produced by the generator. They are of type: array [char] of integer; Their purpose is to convert the first and last characters of the string into numbers that, when added with the string length will produce a unique number for each string. Of course one could use the 2nd, 3rd or other characters in str as well, but remember that only the first and last are guaranteed to exist. The problem, clearly, is that for some sets of inputs the combination of first char, last char and length is not unique, in which case the method will fail to produce a perfect hash function. Sorry I cannot remember the algorithm at all, but this should tell you if it is what you are looking for, and it shouldn't be hard to find in CACM. Michael Condict ...!cmcl2!csd1!condict New York U.