Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!perlman From: perlman@tut.cis.ohio-state.edu (Gary Perlman) Newsgroups: comp.text Subject: Re: How should uniqbib look for "near" duplicates? Message-ID: <30035@tut.cis.ohio-state.edu> Date: 26 Dec 88 07:31:43 GMT References: <455@rhesus.primate.wisc.edu> Organization: Computer & Info Sci Ohio State Univ Columbus, OH 43210 Lines: 36 In article <455@rhesus.primate.wisc.edu> bin@primate.wisc.edu >I've been in correspondence now with several people who have expressed >an interest in looking for "near" duplicates, e.g., such as might arise >when entries are added to bibliographic databases by different people. >In this instance, entries may be "the same" to a human, but actually >slightly different - a journal title might be abbreviated by one person >and not the other. OCLC (Online Computer Library Center) is a non-profit company formed to establish, maintain, and operate a computerized library network (among other things). I was told that they have a database of about 18 million entries, mostly on books. They are very interested in the problem of detecting duplicate bibliographic records. In their annual review of OCLC research (July 1987-June 1988), I saw some work in that area. Tom Hickey would be a good person to ask for pointers, although I do not think he is actively working in that area. One research project is called "Duplicate Detection and the 'Species Problem,'" which was managed by John Bunge. The other is called "Clustering Equivalent Bibliographic Records," which was managed by Elaine Svenonius (a visiting scholar then at the time, so she may not be there). OCLC Online Computer Library Center, Inc. 6565 Frantz Road Dublin, OH 43017 Phone: 614-764-6000 Another area of research that seems relevant is that at BellCore by Sue Dumais (with others) on Latent Sematic Indexing. They had a paper in the last or second to last ACM SIGCHI Conference Proceedings. She can probably be reached at std@bellcore.com. I do not seem to have the physical address. -- Gary Perlman Department of Computer and Information Science perlman@cis.ohio-state.edu The Ohio State University 614-292-2566 2036 Neil Avenue Mall Columbus, OH 43210-1277