Xref: utzoo comp.unix.wizards:19869 comp.lang.c:24681
Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!rutgers!pyrnj!hhb!istvan
From: istvan@hhb.UUCP (Istvan Mohos)
Newsgroups: comp.unix.wizards,comp.lang.c
Subject: fuzzy strcmp
Message-ID: <297@hhb.UUCP>
Date: 22 Dec 89 11:38:39 GMT
Organization: HHB Systems, Mawah, NJ
Lines: 28


tchrist@convexe.uucp (Tom Christiansen @ Convex Computer) writes:
>I'm looking for an algorithm that would allow me to determine
>whether two strings were similar.  Thus 
>
>	"abcde" !~ "xyzzy"
>	"this old man can read" =~ "that old man can't read"
>
>... perhaps just
>    float   strfzcmp(string1,string2)

I must confess, my first reaction was: thank God, Tom 's finally found
a problem he can't solve in Perl.  :-)

You may want to try running the *diff* algorithm along the individual
characters of the two strings (rather than applying it to successive
lines of two files); the ratio of the number of failed chars to the
byte count of the two strings is a dandy float in the range 0.---1.
Thus,
    strfzcmp("abcde","xyzzy") --> 1.
    strfzcmp("this old man can read","that old man can't read") --> .136363..

-- 
        Istvan Mohos
        ...uunet!pyrdc!pyrnj!hhb!istvan
        HHB Systems 1000 Wyckoff Ave. Mahwah NJ 07430 201-848-8000
====================================================================