Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!umcp-cs!chris From: chris@umcp-cs.UUCP (Chris Torek) Newsgroups: net.sources Subject: Re: soundex algorithm wanted Message-ID: <3267@umcp-cs.UUCP> Date: Thu, 4-Sep-86 10:06:47 EDT Article-I.D.: umcp-cs.3267 Posted: Thu Sep 4 10:06:47 1986 Date-Received: Thu, 4-Sep-86 22:05:53 EDT References: <27@houligan.UUCP> <672@bnrmtv.UUCP> <1239@whuxl.UUCP> Organization: Computer Sci. Dept, U of Maryland, College Park, MD Lines: 58 In article <1239@whuxl.UUCP> mike@whuxl.UUCP (BALDWIN) writes: > register char c, lc, prev = '0'; `register int' generates better code on my compiler, and still works. > if (isalpha(*name)) { First you should test isascii(*name) (a nit). > lc = tolower(*name); Watch out! Some tolower()s fail miserably if !isupper(c). Anyway, assuming that the basic algorithm is ... sound, I would change the driver routine, so: #include #define SDXLEN 4 char * soundex(name) register char *name; { static char buf[SDXLEN+1]; static char codes[] = "01230120022455012623010202"; register int c, i = 0, prev; char *strcpy(); #ifdef lint /* lint cannot tell that prev is set before used */ prev = 0; #endif (void) strcpy(buf, "a000"); while ((c = *name++) != 0 && i < SDXLEN) { /* * Throw out non-alphabetics, and convert upper case * to lower. */ if (!isascii(c) || !isalpha(c)) continue; if (isupper(c)) c = tolower(c); /* * Non-first characters must translate to non-zero codes * that are different from the previous code; throw out * those that translate to zero or to prev. */ if (i > 0 && ((c = codes[c - 'a']) == '0' || c == prev)) continue; buf[i++] = prev = c; } return (buf); } -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu