Path: utzoo!attcan!uunet!decwrl!mips!swrinde!ucsd!rutgers!rochester!pt.cs.cmu.edu!dsl.pitt.edu!pitt!willett!ForthNet From: ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) Newsgroups: comp.lang.forth Subject: other forth applications Message-ID: <1744.UUL1.3#5129@willett.pgh.pa.us> Date: 15 Sep 90 01:22:36 GMT Organization: String, Scotch tape, and Paperclips. (in Pgh, PA) Lines: 48 Date: 09-13-90 (10:57) Number: 3769 (Echo) To: ZAFAR ESSAK Refer#: 3743 From: GENE LEFAVE Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE ZE>At the moment I am not sure exactly how this function can be useful. ZE>It does group names which at times seems close: ZE> e.g. SCHMIDT, SMITH, SMYTH are all S530 Although I don't pretend to be a SOUNDEX expert I have some experience using it. First, the state of Illinois uses it to generate driver license numbers. A license number is the SOUNDEX code for your last name, a first name code, ( I don't know where that comes from), and a coded birth date. I used to use SOUNDEX code to retrieve entries in a database. Using SOUNDEX made the program very tolerant of spelling errors. I seem to recall that certain database programs had this function built in. However, English has so many short words that I found that in many cases I was essentially searching on the first character. So I went to a string search. As to the basic algorithm, the idea is to use the first letter, then drop all vowels, then group the remaining consonants into 6 sound alike classes. These classes are English specific, not necessarily ethnic. adjacent duplicates are dropped. SCHMIDT = S530 because S first character. C dropped because its same class as S and adjacent. H always dropped M class 5 I dropped vowel D class 3 T dropped, adjacent class 3 You can easily work out the other names. Its useful for names because most last names are long enough to generate a meaningful code. Assuming a list of 1,000,000 names SOUNDEX hashes to 5616 codes, for 180 average collisions, which would not be difficult to resolve with a first name and birthdate, or some other type of qualifier. You have to remember that it was originally set up for manual searching. --- ~ EZ-Reader 1.13 ~ ----- This message came from GEnie via willett through a semi-automated process. Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us