Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!cs.utexas.edu!sun-barr!ccut!wnoc-tyo-news!sranha!srava!erik From: erik@srava.sra.co.jp (Erik M. van der Poel) Newsgroups: comp.std.c Subject: Re: wchar_t values Message-ID: <1107@sranha.sra.co.jp> Date: 10 Apr 91 05:23:54 GMT References: <15651@smoke.brl.mil> Sender: news@sranha.sra.co.jp Organization: Software Research Associates, Inc., Japan Lines: 42 Nntp-Posting-Host: srava Sorry, I'm a bit late with this reply. Just a few minor nits: Al Harkcom writes: > 'c' in all three of > the popular multibyte encodings (EUC, JIS, SJIS) is 0x63 (same as > ASCII). The most common wide character format (UJIS) has 'c' as > 0x0063 (ASCII in 2 bytes). EUC is the name of the scheme, while UJIS is the name of the Japanese EUC. UJIS is not a wchar_t encoding. > Keld Simonsen writes: > =}Thus the internal widechar representation of 'c' and the external > =}multibyte representation SHOULD not be the same for character sets > =}like ISO 10646, JIS X 0208, KS C 5601 and GB 2312. > =}At least this should hold for characters in the C character set. > > Huh? This doesn't follow... It doesn't even sound correct. A single > byte wide character set using values above 0x80 in addition to the > ASCII characters would become difficult... You're probably referring to the European characters with the 8th bit up. These are not relevant in this discussion since the ANSI C wchar_t spec explicitly refers to the basic character set, which does not include these European characters. > =}The reason why the Japanese have not seen the problem before with > =}JIS X 0208, but first with 10646, is beyond my understanding. > =}Maybe some Japanese could enlighten us (me!) on this? > > What 'problem' do the 'Japanese' see with ISO 10646? Keld is referring to the problem that I brought up in the first article in this thread. I.e. 10646 'c' does not have the same numeric value as ASCII 'c'. - -- Erik M. van der Poel erik@sra.co.jp Software Research Associates, Inc., Tokyo, Japan TEL +81-3-3234-2692