Path: utzoo!utgpu!attcan!uunet!seismo!esosun!cogen!celerity!ps From: ps@celerity.UUCP (Patricia Shanahan) Newsgroups: comp.lang.c Subject: Re: Efficient coding considered harmful? Message-ID: <190@celerity.UUCP> Date: 4 Nov 88 01:03:25 GMT References: <3105@hubcap.UUCP> <34112@XAIT.XEROX.COM> <1700@dataio.Data-IO.COM> <8630@smoke.ARPA> <1704@dataio.Data-IO.COM> <119@twwells.uucp> <7700@bloom-beacon.MIT.EDU> <137@twwells.uucp> <8819@smoke.BRL.MIL> Reply-To: ps@celerity.UUCP (Patricia Shanahan) Organization: FPS Computing, San Diego CA Lines: 44 In article <8819@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) writes: >In article <137@twwells.uucp> bill@twwells.UUCP (T. William Wells) writes: >>#define HASHSIZE 4 /* A power of two, or else! */ > >To continue my complaint against using bitwise operators where arithmetic >is called for, let me point out that most reasonable hashing schemes work >best of the hash table size is NOT a power of two. Now, if you had used >arithmetic instead of bit-diddling throughout your hashing code, most >likely a programmer could change HASHSIZE to some useful number like 113 >and that would be all it would take to improve the hashing scheme. On >the other hand, your code would force him to either use 128 (and obtain >suboptimal performance), or else go through the code and try to put it >back in the shape it should have had from the outset. >... Although I agree with most of what you are saying, it is not safe to assume that hashing schemes necessarily work better for HASHSIZE 113 than for 128. I did some experiments to select a hashing function for use in an assembler. I extracted all identifiers from several assembly language programs and measured the symbol table look up time and other statistics for each of the hash schemes that I was considering. Hashing with a prime size reduced the number of collisions. Hashing with a power of two size reduced the total time to do the symbol table management. The reason was that the extra time to do a prime division compared to a bit masking outweighed the small savings due to reduced search length. This is an empirical result, applicable only to the machine for which I was coding, and the type of mix of input strings that resulted from extracting the identifiers. I agree that micro-efficiency should not be considered during initial coding because of this type of experience. To do a good job of performance implementation for a particular machine requires a lot more work than appears on the surface. Many trade-offs require careful analysis or measurement for the actual machine. It is usually not worth doing for most functions. ps (Patricia Shanahan) uucp : ucsd!celerity!ps arpa : ucsd!celerity!ps@nosc phone: (619) 271-9940