Xref: utzoo comp.emacs:4520 comp.lang.c:13683 comp.sys.ibm.pc:20718 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!decwrl!sun!quintus!ok From: ok@quintus.uucp (Richard A. O'Keefe) Newsgroups: comp.emacs,comp.lang.c,comp.sys.ibm.pc Subject: Re: Programming and international character sets. Keywords: 8 bit characters Message-ID: <605@quintus.UUCP> Date: 31 Oct 88 04:06:25 GMT References: <532@krafla.rhi.hi.is> Sender: news@quintus.UUCP Reply-To: ok@quintus.UUCP (Richard A. O'Keefe) Organization: Quintus Computer Systems, Inc. Lines: 23 In article <532@krafla.rhi.hi.is> kjartan@rhi.hi.is (Kjartan R. Gudmundsson) writes: >The problem is however that the extension is not standard. There is an international standard for 8-bit character sets: ISO 8859. There are several versions of 8859, just as there were several national versions of ISO 646 (of which ASCII was only one). All versions include ASCII has the bottom half. ISO Latin 1 (8859/1) is pretty close to DEC's Multinational Character Set, and is supposed to cover most West European languages (including Icelandic). There is a Cyrillic version (I think it is 8859/2) and others are under way. >An other bad habit of american programmers is this: >character_value = (character_value & 0x7F ) >don't do this!! If you must, you can use 0xFF insted: >character_value = (character_value & 0xFF ) The only time when I've wanted to do this is when stripping off a parity bit, and using 0xFF would be totally wrong. The toascii() macro *might* be appropriate. When you're dealing with a 7 data + 1 parity bit device, there is no point in pretending that you're prepared to accept anything other than 7 data bits. The real problem is trying to write portable code that uses character classes which _aren't_ in . Consider isvowel()...