Xref: utzoo comp.emacs:4535 comp.lang.c:13701 comp.sys.ibm.pc:20743
Path: utzoo!yunexus!geac!syntron!jtsv16!uunet!auspex!guy
From: guy@auspex.UUCP (Guy Harris)
Newsgroups: comp.emacs,comp.lang.c,comp.sys.ibm.pc
Subject: Re: Programming and international character sets.
Keywords: 8 bit characters
Message-ID: <362@auspex.UUCP>
Date: 31 Oct 88 19:09:16 GMT
Article-I.D.: auspex.362
References: <532@krafla.rhi.hi.is> <605@quintus.UUCP>
Reply-To: guy@auspex.UUCP (Guy Harris)
Followup-To: comp.lang.c,comp.sys.ibm.pc
Organization: Auspex Systems, Santa Clara
Lines: 22

>There is a Cyrillic version (I think it is 8859/2)

No, 8859/2 is another Latin set; there are four Latin alphabets
(8859/[1234], I think), and there seem to be at least drafts for Greek
and Cyrillic.

>The only time when I've wanted to do this is when stripping off a parity
>bit, and using 0xFF would be totally wrong.  The toascii() macro *might*
>be appropriate.  When you're dealing with a 7 data + 1 parity bit device,
>there is no point in pretending that you're prepared to accept anything
>other than 7 data bits.

Except that most devices can be *told* to handle 8 bits; never assume
that when you're dealing with a terminal that you're dealing with a 7
data + 1 parity bit device (unless your software deals *only* with one
specific terminal that *can't* generate 8 bits).

>The real problem is trying to write portable code that uses character
>classes which _aren't_ in <ctype.h>.  Consider isvowel()...

Or, for that matter, consider "toupper()"; what's "toupper()" of a
German "ss" (or is it "sz") character?