Path: utzoo!mnetor!uunet!husc6!tut.cis.ohio-state.edu!mdf
From: mdf@tut.cis.ohio-state.edu (Mark D. Freeman)
Newsgroups: comp.databases
Subject: Duplicate elimination
Message-ID: <10496@tut.cis.ohio-state.edu>
Date: 13 Apr 88 19:28:28 GMT
Organization: StrongPoint Systems, Inc.; Columbus, OH. (guest of Ohio State U.)
Lines: 28


I am looking for some algorithms to do duplicate detection on
addresses.  We have several databases which all have as a subset:
	
	First name
	Last Name
	Address1
	Address2
	City
	State
	Zip

We would like some way of determining if a new record represents a
duplicate of the address, taking into account variations in the
addressing (i.e. 201 Test Street and 201 Test St., 201-B Foo Ave and
201 Foo Ave. Apt. B, etc.).  

An algorithm to standardize addresses would be great too.  The post
office uses one for their free 9-digit-zip encoding service, but I
don't know how it works.

Thanks!

-- 
Mark D. Freeman						  (614) 262-1418
					      mdf@tut.cis.ohio-state.edu
2440 Medary Avenue	   ...!cbosgd!osu-cis!tut.cis.ohio-state.edu!mdf
Columbus, OH  43202-3014      Guest account at The Ohio State University