Path: utzoo!utgpu!attcan!uunet!seismo!esosun!cogen!celerity!ps
From: ps@celerity.UUCP (Patricia Shanahan)
Newsgroups: comp.lang.c
Subject: Re: Efficient coding considered harmful?
Message-ID: <190@celerity.UUCP>
Date: 4 Nov 88 01:03:25 GMT
References: <3105@hubcap.UUCP> <34112@XAIT.XEROX.COM> <1700@dataio.Data-IO.COM> <8630@smoke.ARPA> <1704@dataio.Data-IO.COM> <119@twwells.uucp> <7700@bloom-beacon.MIT.EDU> <137@twwells.uucp> <8819@smoke.BRL.MIL>
Reply-To: ps@celerity.UUCP (Patricia Shanahan)
Organization: FPS Computing, San Diego CA
Lines: 44

In article <8819@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <137@twwells.uucp> bill@twwells.UUCP (T. William Wells) writes:
>>#define HASHSIZE 4              /* A power of two, or else! */
>
>To continue my complaint against using bitwise operators where arithmetic
>is called for, let me point out that most reasonable hashing schemes work
>best of the hash table size is NOT a power of two.  Now, if you had used
>arithmetic instead of bit-diddling throughout your hashing code, most
>likely a programmer could change HASHSIZE to some useful number like 113
>and that would be all it would take to improve the hashing scheme.  On
>the other hand, your code would force him to either use 128 (and obtain
>suboptimal performance), or else go through the code and try to put it
>back in the shape it should have had from the outset.
>...


Although I agree with most of what you are saying, it is not safe to assume
that hashing schemes necessarily work better for HASHSIZE 113 than for 128.
I did some experiments to select a hashing function for use in an assembler.
I extracted all identifiers from several assembly language programs and
measured the symbol table look up time and other statistics for each of the
hash schemes that I was considering.

Hashing with a prime size reduced the number of collisions.  Hashing with a
power of two size reduced the total time to do the symbol table management.

The reason was that the extra time to do a prime division compared to a bit
masking outweighed the small savings due to reduced search length.  This is
an empirical result, applicable only to the machine for which I was coding,
and the type of mix of input strings that resulted from extracting the
identifiers. 

I agree that micro-efficiency should not be considered during initial coding
because of this type of experience. To do a good job of performance
implementation for a particular machine requires a lot more work than
appears on the surface. Many trade-offs require careful analysis or
measurement for the actual machine. It is usually not worth doing for most
functions.

	ps
	(Patricia Shanahan)
	uucp : ucsd!celerity!ps
	arpa : ucsd!celerity!ps@nosc
	phone: (619) 271-9940