Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!sun-barr!ccut!wnoc-tyo-news!dclsic!sjc!leia!harkcom From: harkcom@spinach.pa.yokogawa.co.jp Newsgroups: comp.std.internat Subject: Re: wchar_t values Message-ID: Date: 15 Apr 91 04:07:00 GMT References: <1130@sranha.sra.co.jp> Sender: news@leia.pa.yokogawa.co.jp Followup-To: comp.std.internat Organization: Yokogawa Electric Corporation, Tokyo, Japan Lines: 34 In-reply-to: erik@srava.sra.co.jp's message of 12 Apr 91 06:16:14 GMT In article <1130@sranha.sra.co.jp> erik@srava.sra.co.jp (Erik M. van der Poel) writes: =}Also, you refer to "the JIS standard". This is rather misleading, =}since several implementations use *two* JIS standards, namely JIS X =}0208 (Kanji, etc) and the right-hand part of JIS X 0201 (`half-sized' =}Katakana, etc). Actually 3 popular codesets are JIS standard 0201, 0208, and 0212. JIS X 0212 is a set of additional kanzi. =}Perhaps we're getting confused because we are looking at different =}documents. =} [...] =}He refers to codesets 1, 2 and 3 (i.e. not only 0208 =}Kanji, etc). Yes, I'm looking at the documentation from various software packages which use the UJIS encoding. They refer to four code sets: G0: ASCII G1: KANZI (JIS X 0208) G2: HANKAKU (JIS X 0201) G3: GAIZI All four code sets are 16 bits wide. =}According to this paper, UJIS is not a 2 byte code. It is an encoding =}in which characters require 1, 2 or 3 bytes each. I.e. it is an mb =}code, definitely not a wc code. I hate to disagree, but all of the implementations I have seen which use a mb encoding refer to the Japanese EUC as EUC and the wc encodings refer to it as UJIS (except of course HP which refers to both as UJIS). Al