Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!bbn!gatech!udel!rochester!crowl
From: crowl@cs.rochester.edu (Lawrence Crowl)
Newsgroups: comp.std.internat
Subject: International Collating Sequence
Message-ID: <2706@sol.ARPA>
Date: Mon, 28-Sep-87 13:15:32 EDT
Article-I.D.: sol.2706
Posted: Mon Sep 28 13:15:32 1987
Date-Received: Tue, 29-Sep-87 06:43:45 EDT
Reply-To: crowl@cs.rochester.edu (Lawrence Crowl)
Organization: U of Rochester, CS Dept, Rochester, NY
Lines: 38

Several posters in this group have pointed out the difficulty in satisfying
the many national collating sequences within an international character code.
There is a further problem in that if I wish to collate words from several
languages (say a list of authors), then I must pick a collating method that
probably does not include all characters.  In short, I may be forced to use
some local, non-standard collating sequence to handle all entries.  How does
your bibliographic database handle foreign authors?  Does it drop accents that
are not in your native alphabet?

I submit that we need not only an international character code, but an
international collating sequence as well.  Such a sequence should be very
simple.  There should be no "double letter" rules or unnatural separation
of accented letters from base letters.  I see no reason not to embed the
collating sequence within the numeric codes for the characters.

For example, a character set meeting these criteria might have the following
ordering:

   A a `A `a "A "a .A .a  ...  AE ae B b C c ,C ,c D d E e 'E 'e `E `e ...

No international standard based on USASCII can meet this alphabet and still
embed the collating sequence within the character codes.

Note that many letter forms in Latin, Greek, and Cryllic are the same.  It
is possible to merge these three alphabets into a single alphabet.  This will
involve some re-ordering of the letters from at least two of the original
alphabets, but not a great deal.  I do not know whether this is a good idea or
not, I just thought I would mention it.  Of course, we still have Arabic,
Hebrew, Kanji, Kana, etc. to incorporate.

Perhaps a better approach is to start from scratch with a new character
standard.  One designed from the start to accomodate international needs.
I am willing to translate my files to a new character set.  Are you?

-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627