Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!seismo!mcvax!enea!sommar
From: sommar@enea.UUCP
Newsgroups: comp.std.internat
Subject: Re: Character representation
Message-ID: <2201@enea.UUCP>
Date: Wed, 19-Aug-87 16:58:35 EDT
Article-I.D.: enea.2201
Posted: Wed Aug 19 16:58:35 1987
Date-Received: Sat, 22-Aug-87 09:04:07 EDT
References: <2171@enea.UUCP> <709@maccs.UUCP> <2183@enea.UUCP> <719@maccs.UUCP>
Reply-To: sommar@enea.UUCP(Erland Sommarskog)
Followup-To: comp.std.internat
Organization: ENEA DATA Svenska AB, Sweden
Lines: 49

In a recent article gordan@maccs.UUCP (Gordan Palameta) writes:
>t umlaut or q cedilla would probably be used very rarely, nor is it likely
>that anyone would go to the trouble of designing a font to accomodate such
>characters.  Another cost of such generality would be that accents and other
>marks would probably have to be indicated by escape sequences in conjunction
>with the unmodified letter.  This would make string-processing software more
>complicated (and slower), and text would be longer.

If you had something like the 8th bit meaning that the following byte is a
modifier, this would quite moderately increase the length of the text and the 
string-processing time. This solution does however not solely address the
problem that different languages have different collating sequences.

>Not at all, just define a 256-byte lookup table in an include file, and modify
>the code to
>     if (coll[c] >= FIRST_CHAR && coll[c] <= LAST_CHAR)
>with very little loss of efficiency.  To accomodate perverse languages like
>Spanish and Polish which insist on two-letter combinations for sorting,

Spanish and Polish aren't more perverse than English.
  Of course I know about look-tables. I have myself written a programme
that uses a two-level look-up table for comparing words. (And the words
are transcribed in three levels. You don't want the hyphen in a hyphenated
word to be significant.) But to have that in every single programme that
does string comparisons. No, thank you. It does increase the complexity
and the readability of the code.
  It would be much more nice if "ch1 >= ch2" meant that ch1 comes before
or at the same position as ch2 in alphabet we currently have chosen. (It's 
unclair what equality is when modified letter are involved. Probably you will
need two kinds of equality.)

>Never mind the French; what if things had turned out differently in 1588
>with the Armada, and the Spanish had invented computers?  Or the Chinese?

I just took the Frenchmen as an example, OK? No matter who had invented
the computers; if their language also had had the dominating position that
English have, that language would have set the standard for character
representation with no other language in mind. I took French as example
since they have plenty of for sorting non-significant modifiers. 

>Followups to alt.universes.

If you find the subject that uninteresting, why did you ever write
the article at all?
-- 

Erland Sommarskog       
ENEA Data, Stockholm    
sommar@enea.UUCP