Xref: utzoo comp.sys.ibm.pc:16886 comp.binaries.ibm.pc.d:505 comp.emacs:3746 Path: utzoo!attcan!uunet!mcvax!hafro!krafla!frisk From: frisk@rhi.hi.is (Fridrik Skulason) Newsgroups: comp.sys.ibm.pc,comp.binaries.ibm.pc.d,comp.emacs Subject: Re: US PC programmers still live in a 7-bit world! Message-ID: <345@krafla.rhi.hi.is> Date: 1 Jul 88 12:46:52 GMT References: <1988Jun22.223158.1366@LTH.Se> <126@dcs.UUCP> <920@infbs.UUCP> Reply-To: frisk@krafla.UUCP (Fridrik Skulason) Organization: University of Iceland (RHI) Lines: 114 In article <920@infbs.UUCP> neitzel@infbs.UUCP (Martin Neitzel) writes: > > On the other hand, if someone >thinks: "Hey, 7 bits in my char for the ascii code, now let's see what >I can mess around with the 8th!" -- that's neither portable nor justified >by K&R. > Agree! Another related problem is: "Well - nobody uses the 8th bit anyway, so ..... c &= 0x7f;" I have even run into a couple of C compilers that assume this, MEGAMAX C for the ATARI ST, and the C compiler for the Archimedes. This of course means that I advise students here at the university NOT to buy the latter machine. Sometimes one cannot ignore the problem like this. For example, I had to patch PC/NFS some days ago, since the terminal emulator stripped off the 8th bit. (And while doing it I added automatic translation from the PC character set to/from ISO 8859/1) Not so serious maybe, but still boring, are programs that produce a warning message if any character has the 8th bit set. (PC-Kermit for example) > >WNP> (c) incompatible with the way European characters are implemented >WNP> on MOST printers and ANSI terminals. > >Perhaps I should explain what "this way" was: The ascii characters like >[]{}\~ were considered as "not so useful for Europeans" and their codes >were interpreted as national characters. Here in Iceland this used to be true until a few years ago. Then it was decided to standardize on ISO 8859/1 (or "ECMA" code, as it was known then) Now we are using ISO 8858/1 on our VAXes (instead of DEC multinational) and on our HPs (instead of Roman8). Even the ATARI ST computer uses the ISO standard (instead of a not-quite-IBM-PC-compatible character set with hebrew extensions) So, now we can use [,],{,},|,\,@,^,~ and ` together with our national characters. The same goes in all other European countries - we are moving away from modified 7 bit ASCII to standardized 8 bit character sets. > >Wolf is right: IBM did not constrain itself to any standard when >introducing their PCs. But they made a first step into a reasonable >direction: >(1) Keeping 0-127 ASCII, and >(2) Providing most of the Europeans with "their" characters. Most of them, yes, but not all. Not, for example, all the Icelandic, Danish, Norwegian or Portuguese characters. They corrected that on the PS/2 series, by providing the CP-850 character set, which includes all the printable characters in ISO 8859/1 (Latin-1). Unfortunately, they are not in the same positions as in the ISO standard. > >WNP> The way IBM implemented it, all case functions would have to be >WNO> table-driven, which is much less elegant than working with the >WNP> parallel ranges of characters in standard ASCII. > So let's just use a character set like ISO 8859/1, where the European special characters are also (mostly) in pallalel ranges. One problem with parallel ranges is that some characters may only exist as lower case, like the german ess-tzet. (position DF in ISO 8859/1) (the character in position FF is y with two dots above, which is normally not used in upper case.) > >WNP> So all of you Europeans should lobby hardware manufacturers to >WNP> implement foreign characters in an intelligent way, and in a >WNP> STANDARD WAY across different architectures, and THEN you can >WNP> reasonably expect the authors of compilers and libraries and >WNP> tools to support these characters. > >The Intelligent Way will be some ascii extended to eigth bits. > Not just "some ascii" - ISO 8859/1 (or /2 /3 /4) > >Authors of compilers, libraries, tools, and programming languages >have begun to at least consider foreign character sets. ANSI C >is the current popular example. While their trigraphs are just >an superflous wart on a wart in my opinion, their concept of >"locales" is just the thing we all think of. Your for >you, mine for me. The discussion about trigraphs in comp.lang.c some time ago looked somewhat silly to me, since some people were arguing that we (Europeans) needed those, where in fact we do not. After all, who cares if #define looks like define if it functions equally. -- Fridrik Skulason University of Iceland UUCP frisk@rhi.uucp BIX frisk This line intentionally left blank ...................