Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!samsung!munnari.oz.au!basser!ultima!fidogate From: rick_jones@f616.n713.fido.oz (Rick Jones) Newsgroups: comp.lang.c Subject: Re: Soundex (sounds like) Message-ID: <16807@ultima.cs.uts.oz> Date: 14 Dec 89 10:35:24 GMT Sender: fido@ultima.cs.uts.oz Organization: A Fidonet node, gated through ultima.cs.uts.oz Lines: 68 Original to: ing@hades.oz G'day. Rather than give you the code, here's the algorithm (it's a lot simpler to do than most people think): A soundex code is a four character representation based on the way a name sounds rather than the way it is spelled. Theoretically, using this system, you should be able to index a name so that it can be found no matter how it was spelled. The system was developed by Margaret K. Odell and Robert C. Russell (see U.S. Patents 1261167 [1918] and 1435663 [1922]). Every soundex code consists of a letter and three numbers, such as B525. The letter is always the first letter of the surname. The numbers are assigned this way: 1 = b,p,f,v 2 = c,s,k,g,j,q,x,z 3 = d,t 4 = l 5 = m,n 6 = r disregard - a,e,i,o,u,w,y,h To figure out a surname's code, do this: JOHNSON - Eliminate any a,e,i,o,u,w,y,h JNSN - Write the first letter, as is, followed by the codes found in the table above JNSN = J525 No matter how long or short the surname is, the soundex code is always the first letter of the name followed by three numbers. If you have coded the first letter and three numbers but still have more letters in the name, ignore them. If you have run out of letters in the name before you have three numbers, then add zeroes to the code: WASHINGTON = WSNGTN = W252 (ignore the ending TN) KUHNE = KN = K500 (add zeroes to the end) YE = Y = Y000 (add zeroes to the end) Any double letters side by side should be treated as one letter. For example LLOYD is coded as if it were spelled LOYD. GUTIERREZ is coded as if it were GUTIEREZ. You may have different letters side by side that have the same code value. For example PFISTER (P & F are both 1), JACKSON (CKS are all 2). These letters should be treated as one letter. PFISTER is coded as PSTR (P236) and JACKSON is coded as JCN (J250). Thus, variations in spellings or mispellings should produce the same code number. This material based on "Beginning Your Genealogical Research in the National Archives," courtesy ROOTS-BBS, CA, Brian Mavrogeorge, sysop. If you have any trouble coding the above (hardly likely, I'd imagine), let me know and I'll write you a piece of code compatible with SVID. Unfortunately, my routine uses lower-level routines proprietary to my library and it would be useless to give it to you without several other support routines. Hope this is of some help. Rick Jones. --- * Origin: /\/\onitor \/\/orld (~~Sydney Australia~~) (Opus 3:713/616)