Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!umich!samsung!usc!orion.oac.uci.edu!uci-ics!gateway
From: mark@cbmark.cbcc.att.COM (Mark Horton)
Newsgroups: comp.protocols.iso.x400
Subject: Re:  Dutch names in X.400 and/or RFC 1148
Message-ID: <9006130616.AA00701@cbmark.cbcc.att.com>
Date: 13 Jun 90 06:01:35 GMT
Lines: 30
Approved: usenet@PARIS.ICS.UCI.EDU

> > We have a similar problem in AT&T, where everyone is in a name
> > database.  It seems to be handled by ignoring the blanks.  If I
> > ask for people named "de vries", I get 6 people, 3 as "de vries"
> > and 3 different people as "devries".  If I ask for "devries" I get
> > the same 6 people.
>
>So it is time for AT&T to fix there database.
>
>Note that the original requester was dealing with ``real world
>problems''. For Dutch people there a name Jan van der Steen denotes
>an other person then Jan vander Steen or Jan van Dersteen etc.

Exactly.  In the real world, you have to deal with lots of real world
issues, such as databases full of records that were prepared by
different people using different rules, and bureaucracy that prevents
"fixing" the database.  (For example, our database is the result of
merging several separate payroll databases, and it's extremely unlikely
that the payroll department is going to change the spelling of the names
of some people to match the names of other people.  Just getting these
folks to store email addresses and process updates for them is like
pulling teeth.)

The real issue for us is how to design the algorithms to map among
the various possibilities.  Ignoring certain characters such as blank,
apostrophe, hyphen, etc will cause a looser match, which is exactly
what we want for a name lookup.  For X.400 to 822 translation, as
someone pointed out, Jan.van_der_Steen@whatever is even better,
although you'd still want to ignore the _'s in the lookup algorithm.

	Mark