Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!mcvax!enea!sommar
From: sommar@enea.UUCP (Erland Sommarskog)
Newsgroups: comp.std.internat
Subject: Re: Please use the standards! (if there are any)
Message-ID: <2291@enea.UUCP>
Date: Sun, 20-Sep-87 05:48:17 EDT
Article-I.D.: enea.2291
Posted: Sun Sep 20 05:48:17 1987
Date-Received: Sun, 20-Sep-87 22:54:29 EDT
References: <1498@sics.se> <484@kuling.UUCP>
Reply-To: sommar@enea.UUCP(Erland Sommarskog)
Followup-To: comp.std.internat
Organization: ENEA DATA Svenska AB, Sweden
Lines: 35

andersa@kuling.UUCP (Anders Andersson) writes:
>In article <1498@sics.se> dan@sics.se (Dan Sahlin) writes:
>>Most of all, there is a standardised way of switching between coding
>>standards called ISO 2022. The European Computer Manufacturers Association
>>(ECMA) has registered about 100 character sets according to ISO 2022 and
>>the standard for registering new standards (!) ISO 2373.
>>Since among these, we find the various 8859 versions, there is a
>>standardised way to switch between them.  You will also find the Arabic,
>>Hebrew and Cyrillic character sets in the ECMA register.
>
>Yes, there are escape sequences for selecting any set, and I agree that
>this is an appropriate way to represent sequential data, but what if you
>want to access portions of the text randomly in memory? 

Anders' obejction is very valid, I think. The use of escape sequences
doesn't make mixing of letters from different alphabets easy.
  Another example is a compiler. What characters should it accept as
part of identifier names? All letters and numbers, doesn't that seem
reasonable? But with all these sets it is difficult. Take the four 
8859 Latin versions. Latin 1 has letters code 192 and upwards, whereas
the Latin 2-4 all have letters below 192 too. So the compiler must know
the escape sequences and all the standards. Of course it is possible to 
implement, but somehow I think that compiler writers are too lazy for that. 
(And I wouldn't be surprised if someone found use for a character in the 
range 160..192 from Latin 1 as a special character. That character being 
a letter in Latin 2-4.)
  As an example, VAX-pascal supports DEC multinational character set
(which is based on an old draft of Latin 1), at least it says so in the
manual. But what happens if you try to use a letter from the upper half
as part of an identifier? "Illegal ASCII character". Ridiculous!
-- 

Erland Sommarskog       
ENEA Data, Stockholm    
sommar@enea.UUCP