Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!mcvax!enea!sommar From: sommar@enea.UUCP (Erland Sommarskog) Newsgroups: comp.std.internat Subject: Re: Please use the standards! (if there are any) Message-ID: <2291@enea.UUCP> Date: Sun, 20-Sep-87 05:48:17 EDT Article-I.D.: enea.2291 Posted: Sun Sep 20 05:48:17 1987 Date-Received: Sun, 20-Sep-87 22:54:29 EDT References: <1498@sics.se> <484@kuling.UUCP> Reply-To: sommar@enea.UUCP(Erland Sommarskog) Followup-To: comp.std.internat Organization: ENEA DATA Svenska AB, Sweden Lines: 35 andersa@kuling.UUCP (Anders Andersson) writes: >In article <1498@sics.se> dan@sics.se (Dan Sahlin) writes: >>Most of all, there is a standardised way of switching between coding >>standards called ISO 2022. The European Computer Manufacturers Association >>(ECMA) has registered about 100 character sets according to ISO 2022 and >>the standard for registering new standards (!) ISO 2373. >>Since among these, we find the various 8859 versions, there is a >>standardised way to switch between them. You will also find the Arabic, >>Hebrew and Cyrillic character sets in the ECMA register. > >Yes, there are escape sequences for selecting any set, and I agree that >this is an appropriate way to represent sequential data, but what if you >want to access portions of the text randomly in memory? Anders' obejction is very valid, I think. The use of escape sequences doesn't make mixing of letters from different alphabets easy. Another example is a compiler. What characters should it accept as part of identifier names? All letters and numbers, doesn't that seem reasonable? But with all these sets it is difficult. Take the four 8859 Latin versions. Latin 1 has letters code 192 and upwards, whereas the Latin 2-4 all have letters below 192 too. So the compiler must know the escape sequences and all the standards. Of course it is possible to implement, but somehow I think that compiler writers are too lazy for that. (And I wouldn't be surprised if someone found use for a character in the range 160..192 from Latin 1 as a special character. That character being a letter in Latin 2-4.) As an example, VAX-pascal supports DEC multinational character set (which is based on an old draft of Latin 1), at least it says so in the manual. But what happens if you try to use a letter from the upper half as part of an identifier? "Illegal ASCII character". Ridiculous! -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP