Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!seismo!mcvax!enea!sommar
From: sommar@enea.UUCP
Newsgroups: comp.std.internat
Subject: Re: Character representation
Message-ID: <2183@enea.UUCP>
Date: Fri, 14-Aug-87 17:08:55 EDT
Article-I.D.: enea.2183
Posted: Fri Aug 14 17:08:55 1987
Date-Received: Sun, 16-Aug-87 08:44:04 EDT
References: <2171@enea.UUCP> <709@maccs.UUCP>
Reply-To: sommar@enea.UUCP(Erland Sommarskog)
Followup-To: comp.std.internat
Organization: ENEA DATA Svenska AB, Sweden
Lines: 44

In a recent article gordan@maccs.UUCP (Gordan Palameta) writes:
>In article <2171@enea.UUCP> sommar@enea.UUCP(Erland Sommarskog) writes:
>>  I think that the simple represenatation for charcters is completely
>>due the dominating position of the English language in the computer
>>world. If computers had been invented in France the problem would
>>have been solved. (And if they had been Swedish, Englishmen would
>
>It gets even more complicated:  in Spanish, I believe, ch is considered
>a separate letter, between c and d in alphabetical order (likewise with ll).

Perfectly true. And Spanish is not unique. Polish, for instanc, have cz,
dz and rz.

>It only goes to show that alphabetical order is language-dependent, and
>identical strings will sort differently depending on locale.  The only
>general solution is to have intelligent operating system routines to
>handle sorting.

It would be preferably to have the sorting as part of the langauge
in question. For example in Ada:
   pragma LANGUAGE(French)
The support may be in the OS - or even the hardware for speed - but
making part of the language increases portability. But this doesn't
address all problems I mentioned. How to construct a general character
with an arbitrary accent, umlaut or other diacritic mark? An 8-bit
enumarate isn't sufficient.

>Despite 7-bit ASCII, which makes possible code such as
>   if (c >= 'A' && c <= 'Z')
>there is no reason why the numeric representation of a character should have
>anything to do with the position of that character in a collating sequence.

Right, but almost all programming today depends on it, isn't it so? 
It's easier to implement and executes faster. The character type should
be an abstract one. The actual implementation (bit size and all) could
vary from compiler to compiler, from OS to OS.
  The simple numeric representation happens to work for English. For
French it doesn't. If coumputers had been invented in France....

-- 

Erland Sommarskog       
ENEA Data, Stockholm    
sommar@enea.UUCP