Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!bloom-beacon!apple!voder!pyramid!prls!philabs!micomvax!ray From: ray@micomvax.UUCP (Ray Dunn) Newsgroups: comp.lang.c Subject: Re: hash algorithm Message-ID: <1246@micomvax.UUCP> Date: 31 Aug 88 17:22:13 GMT References: <654@novavax.UUCP> <2550078@hpisod2.HP.COM> <4712@b-tech.UUCP> <2392@rtech.rtech.com> Reply-To: ray@micomvax.UUCP (Ray Dunn) Organization: Philips Electronics Ltd. (TDS - Montreal) St. Laurent QC, Canada Lines: 42 In article <2392@rtech.rtech.com> daveb@llama.UUCP (Dave Brower) writes: > ... > If the collision chains generated by the naive function are short, it > may be cheaper to search the chain than use a smarter hash. > ... Definitely! > Moral: Try several ways and *measure*. The results may be surprising. Better Moral: Whenever possible analyze the properties and distributions of the data you want to handle and *then* design the hashing algorithm. At that point it is then reasonable to measure and tinker. This discussion is getting very like those on sorting - please realize there is no *general* best method, there are only best methods in specific application environments. The goal should not normally be to minimize the search time for an individual item, but to minimize the total time spent searching in a given run session or series of runs, of the application - a fact often ignored in discussions of minimizing chain lengths etc.. If the tokens "(", ")", ";" and "," each occur (i.e. require to be "looked-up") approximately n times more frequently than the tokens "while", "for", "switch" and "do", then wouldn't it be reasonable to design your algorithm so that the former items were located approximately n times faster than the latter? This can be achieved in a variety of ways, including, let us not forget, ad hoc tests for specific high frequency items prior to applying a generalized routine. In the case of a fixed data-base, for example the reserved words in a compiler, it does not take much effort to analyze a filesystem or three for token frequencies and then structure and *order* the dictionary accordingly. -- Ray Dunn. | UUCP: ..!philabs!micomvax!ray Philips Electronics Ltd. | TEL : (514) 744-8200 Ext: 2347 600 Dr Frederik Philips Blvd | FAX : (514) 744-6455 St Laurent. Quebec. H4M 2S9 | TLX : 05-824090