Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!bbn!gatech!udel!rochester!crowl From: crowl@cs.rochester.edu (Lawrence Crowl) Newsgroups: comp.std.internat Subject: International Collating Sequence Message-ID: <2706@sol.ARPA> Date: Mon, 28-Sep-87 13:15:32 EDT Article-I.D.: sol.2706 Posted: Mon Sep 28 13:15:32 1987 Date-Received: Tue, 29-Sep-87 06:43:45 EDT Reply-To: crowl@cs.rochester.edu (Lawrence Crowl) Organization: U of Rochester, CS Dept, Rochester, NY Lines: 38 Several posters in this group have pointed out the difficulty in satisfying the many national collating sequences within an international character code. There is a further problem in that if I wish to collate words from several languages (say a list of authors), then I must pick a collating method that probably does not include all characters. In short, I may be forced to use some local, non-standard collating sequence to handle all entries. How does your bibliographic database handle foreign authors? Does it drop accents that are not in your native alphabet? I submit that we need not only an international character code, but an international collating sequence as well. Such a sequence should be very simple. There should be no "double letter" rules or unnatural separation of accented letters from base letters. I see no reason not to embed the collating sequence within the numeric codes for the characters. For example, a character set meeting these criteria might have the following ordering: A a `A `a "A "a .A .a ... AE ae B b C c ,C ,c D d E e 'E 'e `E `e ... No international standard based on USASCII can meet this alphabet and still embed the collating sequence within the character codes. Note that many letter forms in Latin, Greek, and Cryllic are the same. It is possible to merge these three alphabets into a single alphabet. This will involve some re-ordering of the letters from at least two of the original alphabets, but not a great deal. I do not know whether this is a good idea or not, I just thought I would mention it. Of course, we still have Arabic, Hebrew, Kanji, Kana, etc. to incorporate. Perhaps a better approach is to start from scratch with a new character standard. One designed from the start to accomodate international needs. I am willing to translate my files to a new character set. Are you? -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{allegra,decvax,rutgers}!rochester!crowl Rochester, New York, 14627