Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!csd4.milw.wisc.edu!leah!itsgw!steinmetz!uunet!ispi!jbayer From: jbayer@ispi.UUCP (Jonathan Bayer) Newsgroups: comp.text Subject: Re: How should uniqbib look for "near" duplicates? Summary: try the Soundex algorithm Message-ID: <371@ispi.UUCP> Date: 28 Dec 88 22:53:01 GMT References: <455@rhesus.primate.wisc.edu> Organization: Intelligent Software Products, Inc. Lines: 20 In article <455@rhesus.primate.wisc.edu>, bin@primate.wisc.edu (Brain in Neutral) writes: > A short while ago, I posted "uniqbib", a program for eliminating > > Several strategies for finding near duplicates have been suggested to > me, and I've thought of several others. I'm asking for comment from > the net on this issue. Given two entries, how would you determine > whether they are the same. (phrased another way, how you you estimate > the distance between two entries?) > Try the Soundex algorithm. It will be able to match two words which are spelled differently, but which are basicly the same. It will not be able to match an abbriviation with a full word, however. Jonathan Bayer Intelligent Software Products, Inc. -- life used to be so simple.