Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!mit-eddie!apollo!sandi From: sandi@apollo.uucp (Sandra Martin) Newsgroups: comp.std.internat Subject: Re: International Collating Sequence Message-ID: <379119b2.b88e@apollo.uucp> Date: Tue, 29-Sep-87 08:50:00 EDT Article-I.D.: apollo.379119b2.b88e Posted: Tue Sep 29 08:50:00 1987 Date-Received: Wed, 30-Sep-87 06:53:57 EDT References: <2706@sol.ARPA> Organization: Apollo Computer, Chelmsford, Mass. Lines: 38 Lawrence Crowl @ U of Rochester, CS Dept, Rochester, NY writes: >I submit that we need not only an international character code, but an >international collating sequence as well. Such a sequence should be very >simple. There should be no "double letter" rules or unnatural separation >of accented letters from base letters. I see no reason not to embed the >collating sequence within the numeric codes for the characters. > >For example, a character set meeting these criteria might have the following >ordering: > > A a `A `a "A "a .A .a ... AE ae B b C c ,C ,c D d E e 'E 'e `E `e ... I agree that an international collating sequence would be nice, but you can't make arbitrary rules against double letters and separating characters with diacriticals. In Spanish, 'ch' sorts between 'c' and 'd' in the alphabet (likewise, 'll' comes between 'l' and 'm'). How would your sequence handle this situation? You cannot ignore it just because it's inconvenient. In the Swedish alphabet, a(ring), a", and o" appear AFTER 'z'. They DO NOT sort with the unaccented a's and o's. In Danish, the 'ae' ligature also appears near the end of the alphabet. Why should an international collating sequence fail to recognize these realities? A few months back, Erland Sommarskog of ENEA Data in Stockholm posted an article to this newsgroup in which he noted (perhaps partly in jest) that if the Swedes had invented computers, English-speakers would have had to accept the fact that 'v' and 'w' are equivalent. As an English speaker, I'm sure you wouldn't want to accept such a restriction. Why should people from other countries have to accept an unnatural order for their characters? The fact is that there is no way to construct ONE international collating sequence. In German, the a" sorts with the other a's. In Swedish, it sorts at the end of the alphabet. So whatever solution is invented, it must be flexible enough to handle these realities. Sandra Martin, Apollo Computer UUCP: ...{mit-erl,mit-eddie,yale,uw-beaver,decvax}!apollo!sandi ARPA: apollo!sandi@eddie.mit.edu