Path: utzoo!utgpu!watserv1!watmath!att!occrsh!uokmax!apple!usc!cs.utexas.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!dsl.pitt.edu!pitt!willett!ForthNet
From: ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie)
Newsgroups: comp.lang.forth
Subject: other forth applications
Message-ID: <1717.UUL1.3#5129@willett.pgh.pa.us>
Date: 12 Sep 90 03:26:43 GMT
Organization: String, Scotch tape, and Paperclips.  (in Pgh, PA)
Lines: 82


 Date: 09-09-90 (21:56)              Number: 3743 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 I have been experimenting with the utility SOUNDEX described by Ron 
 Braithwaite in FD X/3 & 4 in 1988.  I modified it slightly for use 
 without a string stack and to be compatible with F-PC as follows: 

 \ SOUNDEX.TXT  Ron Braithwaite "Using A String Stack" FD X/3 p.15 
 (1988) 

 (( 
 The whole idea of SOUNDEX dates back to the 1894 U.S. census when they 
 wanted to be able to find names that sounded alike.  The algorithm for 
 $SOUNDEX came from Guy Kelly. 

 )) 

 ONLY FORTH ALSO DEFINITIONS 

 DECIMAL 

 : C>SNDX ( ascii--char2) 
     DUP 97 > IF 32 - THEN               \ convert to uppercase 
     65 - 0 MAX 26 MIN 
     ( ABCDEFGHIJKLMNOPQRSTUVWXYZ ) 
     " 012301200224550126230102020" DROP + C@ ; 

 CREATE sndx.buf ( --$adr) ," 0000" 

 : >SOUNDEX ( adr1,n--$adr2)              \ 0000 <= $adr2 <= Z999 
     0 sndx.buf C!   sndx.buf 1+ 4 ASCII 0 FILL 
     ?DUP 
         IF  OVER C@ 
             DUP 97 > IF 32 - THEN       \ convert to uppercase 
             DUP sndx.buf 1+ C!          \ store first character 
                 1 sndx.buf C+!          \ as start of $soundex 
             C>SNDX -ROT                 \ earlier character's sndx 
             BOUNDS 1+ 
             ?DO I C@ C>SNDX             \ old,new 
                 TUCK = 
                 OVER ASCII 0 = OR 0= 
                     IF DUP sndx.buf COUNT + C! 1 sndx.buf C+! 
                     THEN sndx.buf C@ 4 = ?LEAVE 
             LOOP 
         THEN DROP 
     sndx.buf 4 OVER C! ; 

 : $SOUNDEX ( $adr1--$adr2)              \ 0000 <= $adr2 <= Z999 
     COUNT >SOUNDEX ; 


 CR .( cr pad dup 20 expect cr span @ ) CR CR 
 CR .( >SOUNDEX cr count type space ) CR 

 ====================================================== 

 Now I am wondering if anyone can tell me if I have inadvertantly 
 introduced any errors in this translation? 

 Assuming I have not I have taken the above code and applied it to 2,000 
 names from an existing database and have been examining the results. 
 At the moment I am not sure exactly how this function can be useful. 
 It does group names which at times seems close: 
         e.g. SCHMIDT, SMITH, SMYTH are all S530 
 But other times names such as: 
         ACTON, ASHDOWN, AUSTIN are grouped as A235. 

 I have wondered if the ethnic origin of names might affect the 
 weighting used in the definitions above.  Any comments would be 
 welcomed. 

 Zafar. 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us