Path: utzoo!attcan!ncrcan!scocan!ron From: ron@sco.COM (Ron Irvine) Newsgroups: comp.lang.c Subject: hash function for text in C Message-ID: <1990Oct19.165408.27856@sco.COM> Date: 19 Oct 90 20:54:08 GMT Reply-To: ron@sco.COM (Ron Irvine) Organization: SCO Canada, Inc. (formerly HCR Corporation) Lines: 49 I have used many hash functions. Each one has advantages and disadvantages. Here is a powerful but simple one. It simply does a crc16 on the character string. The crc16 is a great hash function since it is designed to produce a unique code for different character strings. So for "reasonable" text it gives a good distribution of hash numbers (a big problem to solve in general). The crc16 function I list below uses nibbles to build the crc from each byte, this is faster than a bit by bit and less room than a 256 entry table needed for a byte by byte (fits in cache). So, it is fast (no multiply, just shifts and masks), well behaved and simple. If you have a VAX you could even use the "crc" instruction instead of the crc16 function - what a CISC. To use, do a crc16 on the string and the truncate the 16 bit result to the size you need (this should be a power of 2). /* crc16 - crc16 routine * * R.K. Irvine * * This routine returns the crc16 for the string pointed * to by "in". * crc16 is given by: x^16 + x^15 + x^2 + 1 */ unsigned short crc16(register char *in) { register unsigned int n, crc; static unsigned short crc16l[] = { 0x0000,0xc0c1,0xc181,0x0140, 0xc301,0x03c0,0x0280,0xc241, 0xc601,0x06c0,0x0780,0xc741, 0x0500,0xc5c1,0xc481,0x0440, }; static unsigned short crc16h[] = { 0x0000,0xcc01,0xd801,0x1400, 0xf001,0x3c00,0x2800,0xe401, 0xa001,0x6c00,0x7800,0xb401, 0x5000,0x9c01,0x8801,0x4400, }; crc = 0; while (n = (unsigned char)(*in++)) { n ^= crc; crc = crc16l[n&0x0f] ^ crc16h[(n>>4)&0x0f] ^ (crc>>8); } return(crc); }