Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!umich!samsung!usc!orion.oac.uci.edu!uci-ics!gateway From: mark@cbmark.cbcc.att.COM (Mark Horton) Newsgroups: comp.protocols.iso.x400 Subject: Re: Dutch names in X.400 and/or RFC 1148 Message-ID: <9006130616.AA00701@cbmark.cbcc.att.com> Date: 13 Jun 90 06:01:35 GMT Lines: 30 Approved: usenet@PARIS.ICS.UCI.EDU > > We have a similar problem in AT&T, where everyone is in a name > > database. It seems to be handled by ignoring the blanks. If I > > ask for people named "de vries", I get 6 people, 3 as "de vries" > > and 3 different people as "devries". If I ask for "devries" I get > > the same 6 people. > >So it is time for AT&T to fix there database. > >Note that the original requester was dealing with ``real world >problems''. For Dutch people there a name Jan van der Steen denotes >an other person then Jan vander Steen or Jan van Dersteen etc. Exactly. In the real world, you have to deal with lots of real world issues, such as databases full of records that were prepared by different people using different rules, and bureaucracy that prevents "fixing" the database. (For example, our database is the result of merging several separate payroll databases, and it's extremely unlikely that the payroll department is going to change the spelling of the names of some people to match the names of other people. Just getting these folks to store email addresses and process updates for them is like pulling teeth.) The real issue for us is how to design the algorithms to map among the various possibilities. Ignoring certain characters such as blank, apostrophe, hyphen, etc will cause a looser match, which is exactly what we want for a name lookup. For X.400 to 822 translation, as someone pointed out, Jan.van_der_Steen@whatever is even better, although you'd still want to ignore the _'s in the lookup algorithm. Mark