Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!seismo!mcvax!enea!sommar From: sommar@enea.UUCP Newsgroups: comp.std.internat Subject: Re: Character representation Message-ID: <2183@enea.UUCP> Date: Fri, 14-Aug-87 17:08:55 EDT Article-I.D.: enea.2183 Posted: Fri Aug 14 17:08:55 1987 Date-Received: Sun, 16-Aug-87 08:44:04 EDT References: <2171@enea.UUCP> <709@maccs.UUCP> Reply-To: sommar@enea.UUCP(Erland Sommarskog) Followup-To: comp.std.internat Organization: ENEA DATA Svenska AB, Sweden Lines: 44 In a recent article gordan@maccs.UUCP (Gordan Palameta) writes: >In article <2171@enea.UUCP> sommar@enea.UUCP(Erland Sommarskog) writes: >> I think that the simple represenatation for charcters is completely >>due the dominating position of the English language in the computer >>world. If computers had been invented in France the problem would >>have been solved. (And if they had been Swedish, Englishmen would > >It gets even more complicated: in Spanish, I believe, ch is considered >a separate letter, between c and d in alphabetical order (likewise with ll). Perfectly true. And Spanish is not unique. Polish, for instanc, have cz, dz and rz. >It only goes to show that alphabetical order is language-dependent, and >identical strings will sort differently depending on locale. The only >general solution is to have intelligent operating system routines to >handle sorting. It would be preferably to have the sorting as part of the langauge in question. For example in Ada: pragma LANGUAGE(French) The support may be in the OS - or even the hardware for speed - but making part of the language increases portability. But this doesn't address all problems I mentioned. How to construct a general character with an arbitrary accent, umlaut or other diacritic mark? An 8-bit enumarate isn't sufficient. >Despite 7-bit ASCII, which makes possible code such as > if (c >= 'A' && c <= 'Z') >there is no reason why the numeric representation of a character should have >anything to do with the position of that character in a collating sequence. Right, but almost all programming today depends on it, isn't it so? It's easier to implement and executes faster. The character type should be an abstract one. The actual implementation (bit size and all) could vary from compiler to compiler, from OS to OS. The simple numeric representation happens to work for English. For French it doesn't. If coumputers had been invented in France.... -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP