Xref: utzoo comp.emacs:4535 comp.lang.c:13701 comp.sys.ibm.pc:20743 Path: utzoo!yunexus!geac!syntron!jtsv16!uunet!auspex!guy From: guy@auspex.UUCP (Guy Harris) Newsgroups: comp.emacs,comp.lang.c,comp.sys.ibm.pc Subject: Re: Programming and international character sets. Keywords: 8 bit characters Message-ID: <362@auspex.UUCP> Date: 31 Oct 88 19:09:16 GMT Article-I.D.: auspex.362 References: <532@krafla.rhi.hi.is> <605@quintus.UUCP> Reply-To: guy@auspex.UUCP (Guy Harris) Followup-To: comp.lang.c,comp.sys.ibm.pc Organization: Auspex Systems, Santa Clara Lines: 22 >There is a Cyrillic version (I think it is 8859/2) No, 8859/2 is another Latin set; there are four Latin alphabets (8859/[1234], I think), and there seem to be at least drafts for Greek and Cyrillic. >The only time when I've wanted to do this is when stripping off a parity >bit, and using 0xFF would be totally wrong. The toascii() macro *might* >be appropriate. When you're dealing with a 7 data + 1 parity bit device, >there is no point in pretending that you're prepared to accept anything >other than 7 data bits. Except that most devices can be *told* to handle 8 bits; never assume that when you're dealing with a terminal that you're dealing with a 7 data + 1 parity bit device (unless your software deals *only* with one specific terminal that *can't* generate 8 bits). >The real problem is trying to write portable code that uses character >classes which _aren't_ in . Consider isvowel()... Or, for that matter, consider "toupper()"; what's "toupper()" of a German "ss" (or is it "sz") character?