Xref: utzoo comp.unix.wizards:19869 comp.lang.c:24681 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!rutgers!pyrnj!hhb!istvan From: istvan@hhb.UUCP (Istvan Mohos) Newsgroups: comp.unix.wizards,comp.lang.c Subject: fuzzy strcmp Message-ID: <297@hhb.UUCP> Date: 22 Dec 89 11:38:39 GMT Organization: HHB Systems, Mawah, NJ Lines: 28 tchrist@convexe.uucp (Tom Christiansen @ Convex Computer) writes: >I'm looking for an algorithm that would allow me to determine >whether two strings were similar. Thus > > "abcde" !~ "xyzzy" > "this old man can read" =~ "that old man can't read" > >... perhaps just > float strfzcmp(string1,string2) I must confess, my first reaction was: thank God, Tom 's finally found a problem he can't solve in Perl. :-) You may want to try running the *diff* algorithm along the individual characters of the two strings (rather than applying it to successive lines of two files); the ratio of the number of failed chars to the byte count of the two strings is a dandy float in the range 0.---1. Thus, strfzcmp("abcde","xyzzy") --> 1. strfzcmp("this old man can read","that old man can't read") --> .136363.. -- Istvan Mohos ...uunet!pyrdc!pyrnj!hhb!istvan HHB Systems 1000 Wyckoff Ave. Mahwah NJ 07430 201-848-8000 ====================================================================