Path: utzoo!utgpu!watserv1!watmath!att!occrsh!uokmax!apple!usc!cs.utexas.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!dsl.pitt.edu!pitt!willett!ForthNet From: ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) Newsgroups: comp.lang.forth Subject: other forth applications Message-ID: <1717.UUL1.3#5129@willett.pgh.pa.us> Date: 12 Sep 90 03:26:43 GMT Organization: String, Scotch tape, and Paperclips. (in Pgh, PA) Lines: 82 Date: 09-09-90 (21:56) Number: 3743 (Echo) To: ALL Refer#: NONE From: ZAFAR ESSAK Read: (N/A) Subj: SOUNDEX Status: PUBLIC MESSAGE I have been experimenting with the utility SOUNDEX described by Ron Braithwaite in FD X/3 & 4 in 1988. I modified it slightly for use without a string stack and to be compatible with F-PC as follows: \ SOUNDEX.TXT Ron Braithwaite "Using A String Stack" FD X/3 p.15 (1988) (( The whole idea of SOUNDEX dates back to the 1894 U.S. census when they wanted to be able to find names that sounded alike. The algorithm for $SOUNDEX came from Guy Kelly. )) ONLY FORTH ALSO DEFINITIONS DECIMAL : C>SNDX ( ascii--char2) DUP 97 > IF 32 - THEN \ convert to uppercase 65 - 0 MAX 26 MIN ( ABCDEFGHIJKLMNOPQRSTUVWXYZ ) " 012301200224550126230102020" DROP + C@ ; CREATE sndx.buf ( --$adr) ," 0000" : >SOUNDEX ( adr1,n--$adr2) \ 0000 <= $adr2 <= Z999 0 sndx.buf C! sndx.buf 1+ 4 ASCII 0 FILL ?DUP IF OVER C@ DUP 97 > IF 32 - THEN \ convert to uppercase DUP sndx.buf 1+ C! \ store first character 1 sndx.buf C+! \ as start of $soundex C>SNDX -ROT \ earlier character's sndx BOUNDS 1+ ?DO I C@ C>SNDX \ old,new TUCK = OVER ASCII 0 = OR 0= IF DUP sndx.buf COUNT + C! 1 sndx.buf C+! THEN sndx.buf C@ 4 = ?LEAVE LOOP THEN DROP sndx.buf 4 OVER C! ; : $SOUNDEX ( $adr1--$adr2) \ 0000 <= $adr2 <= Z999 COUNT >SOUNDEX ; CR .( cr pad dup 20 expect cr span @ ) CR CR CR .( >SOUNDEX cr count type space ) CR ====================================================== Now I am wondering if anyone can tell me if I have inadvertantly introduced any errors in this translation? Assuming I have not I have taken the above code and applied it to 2,000 names from an existing database and have been examining the results. At the moment I am not sure exactly how this function can be useful. It does group names which at times seems close: e.g. SCHMIDT, SMITH, SMYTH are all S530 But other times names such as: ACTON, ASHDOWN, AUSTIN are grouped as A235. I have wondered if the ethnic origin of names might affect the weighting used in the definitions above. Any comments would be welcomed. Zafar. --- * Via Qwikmail 2.01 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886 ----- This message came from GEnie via willett through a semi-automated process. Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us