Path: utzoo!attcan!uunet!husc6!mailrus!csd4.milw.wisc.edu!indri!rhesus!bin From: bin@primate.wisc.edu (Brain in Neutral) Newsgroups: comp.text Subject: How should uniqbib look for "near" duplicates? Message-ID: <455@rhesus.primate.wisc.edu> Date: 22 Dec 88 18:21:59 GMT Organization: UW-Madison Primate Center Lines: 24 A short while ago, I posted "uniqbib", a program for eliminating duplicates from bibliographic databases in refer format. My motivation originally was too allow the results of several overlapping lookbib queries to filter those references that were hits on more than one query. In such cases you know the entries will be identical. I've been in correspondence now with several people who have expressed an interest in looking for "near" duplicates, e.g., such as might arise when entries are added to bibliographic databases by different people. In this instance, entries may be "the same" to a human, but actually slightly different - a journal title might be abbreviated by one person and not the other. Several strategies for finding near duplicates have been suggested to me, and I've thought of several others. I'm asking for comment from the net on this issue. Given two entries, how would you determine whether they are the same. (phrased another way, how you you estimate the distance between two entries?) I would prefer that responses be posted. Thanks. Paul DuBois dubois@primate.wisc.edu rhesus!dubois bin@primate.wisc.edu rhesus!bin