Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!csd4.milw.wisc.edu!leah!itsgw!steinmetz!uunet!ispi!jbayer
From: jbayer@ispi.UUCP (Jonathan Bayer)
Newsgroups: comp.text
Subject: Re: How should uniqbib look for "near" duplicates?
Summary: try the Soundex algorithm
Message-ID: <371@ispi.UUCP>
Date: 28 Dec 88 22:53:01 GMT
References: <455@rhesus.primate.wisc.edu>
Organization: Intelligent Software Products, Inc.
Lines: 20

In article <455@rhesus.primate.wisc.edu>, bin@primate.wisc.edu (Brain in Neutral) writes:
> A short while ago, I posted "uniqbib", a program for eliminating
> 
> Several strategies for finding near duplicates have been suggested to
> me, and I've thought of several others.  I'm asking for comment from
> the net on this issue.  Given two entries, how would you determine
> whether they are the same.  (phrased another way, how you you estimate
> the distance between two entries?)
> 

Try the Soundex algorithm.  It will be able to match two words which are
spelled differently, but which are basicly the same.  It will not be
able to match an abbriviation with a full word, however.

Jonathan Bayer
Intelligent Software Products, Inc.


-- 
life used to be so simple.