Xref: utzoo comp.emacs:4537 comp.lang.c:13710 comp.sys.ibm.pc:20748 Path: utzoo!mnetor!george From: george@mnetor.UUCP (George Hart) Newsgroups: comp.emacs,comp.lang.c,comp.sys.ibm.pc Subject: Re: Programming and international character sets. Keywords: 8 bit characters Message-ID: <4776@mnetor.UUCP> Date: 2 Nov 88 16:21:25 GMT References: <532@krafla.rhi.hi.is> Reply-To: george@mnetor.UUCP (George Hart) Organization: Computer X (CANADA) Ltd., Toronto, Ontario, Canada Lines: 51 In article <532@krafla.rhi.hi.is> kjartan@rhi.hi.is (Kjartan R. Gudmundsson) writes: > >How difficult is it convert american/english programs so that they can >be used to handle foreign text? If you just need to handle full 8 bit characters, it is merely painful. If you need to handle multibyte characters (e.g. Kanji) or a mix of character sets, it is excruciating. >In other european countries than England >the ASCII character set is also widely used but with extension. >The character set is 8 bit thus allowing 256 characters. >The problem is however that the extension is not standard. There is, of course, the ISO 8859 family of 8 bit character sets which contain ASCII as a perfect subset. > < excerpts of MicroEmacs code > > >Ugly isn't it? Yes. vi and the Bourne shell were(are) other offenders. I believe recent releases of SysV have cleaned up the naughty uses of the 8th bit. > < sample ctype.h invocations > > >This code is better (most of the is.. things are macros that mask >the argument and return the binary mask that is either zero or positve) >has more style to it and is easiear to port to a diffrent character set. Unfortunately, the results of the macros are undefined unless isascii(c) is positive which sort of defeats the spirit of what you intend. Of course, you could develop an 8 bit ctype.h compatible with a particular 8 bit character set. >An other bad habit of american programmers is this: >character_value = (character_value & 0x7F ) This has more to do with assumptions about character sets supported by the system than nationality. Historically, assuming an ASCII environment was not unreasonable. While this is no longer true, until vendors and standards bodies get off their collective pots and develop practical character sets and conventions for multilingual environments (including multibyte characters), things will remain confused, fragmented, and incompatible. -- Regards.....George Hart, Computer X Canada Ltd. UUCP: {utzoo,uunet}!mnetor!george BELL: (416)475-8980