Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!apple!bionet!agate!ucbvax!bloom-beacon!eru!hagbard!sunic!dkuug!dkuugin!keld From: keld@login.dkuug.dk (Keld J|rn Simonsen) Newsgroups: comp.std.c Subject: Re: wchar_t values Message-ID: Date: 5 Apr 91 16:23:36 GMT References: <1006@sranha.sra.co.jp> <15651@smoke.brl.mil> Sender: news@slyrf.dkuug.dk Lines: 65 harkcom@spinach.pa.yokogawa.co.jp writes: >In article keld@login.dkuug.dk > (Keld J|rn Simonsen) writes: > =}JIS X 0208 (basic Japanese 16-bit standard) /035/099 > JIS X 0208 doesn't cover the ASCII characters. It has a double >sized (zenkaku) English character set though. 'c' in all three of >the popular multibyte encodings (EUC, JIS, SJIS) is 0x63 (same as >ASCII). The most common wide character format (UJIS) has 'c' as >0x0063 (ASCII in 2 bytes). I understand what Al is saying, that the row 2 in the Japanese, Chinese and Korean basic 16-bit character sets, which all contains what to me looks like complete ASCII, is in fact not ASCII, but double-sized English characters. When doing coding, at least in Japan, the programmer usually combine the 16-bit character set with ASCII in an encoding which is 8/16 bits (or 7/14 bits). (Now I do not have great luck in saying what I think other people mean:-( > I don't know the encodings for the Chinese & Korean well, but the >standards don't seem to cover 'c'... I have my information from the ECMA registry of character sets, and I really doubt that these informations are incorrect or that I have misread them. > =}None of these values have the nice property of having ASCII 'c' > =}extend into these values when loading as a 16-bit or 32-bit int. > See above... My points still hold. You could have troubles handeling widechar characters in clean 16-bit de jure standards. Apparantly people out there don't program widechars in these character sets (true 16-bit), But always combine with other character sets. > =}think there is a problem > =}and they have not yet been able to solve it. > A problem with ISO 10646? A problem with the 'East-asian de jure' >character sets in reference to wchar_t? WG14 has got a letter from SC2 pointing out an apparant problem with 10646, that the characters in the C repertoire in 10646 canonical form was different from a sign-extended single-byte character. I have been actioned by WG14 to respond to SC2. > Your apparent knowledge of the JIS standard shows you have little >room to point... Well, my knowledge can always be improved. Still the facts I have represented on 16-bit character sets are true. They may be irrelevant as the usage is done in combination with other character sets in an encoding. And the whole problem with 10646 (and other multibyte character sets) usage in widechar strings may be non-existing. I really hope there is no problem, then we do not need to make changes anywhere. But we should write some explanations on how this is supposed to function, as quite some people have had problems with this. I think the best place to write such interpretations is in the forthcoming ISO C addendum. Keld Simonsen