Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!sun-barr!ccut!wnoc-tyo-news!dclsic!sjc!leia!harkcom
From: harkcom@spinach.pa.yokogawa.co.jp
Newsgroups: comp.std.internat
Subject: Re: wchar_t values
Message-ID: <HARKCOM.91Apr15130700@spinach.pa.yokogawa.co.jp>
Date: 15 Apr 91 04:07:00 GMT
References: <HARKCOM.91Apr11091529@spinach.pa.yokogawa.co.jp>
	<1130@sranha.sra.co.jp>
Sender: news@leia.pa.yokogawa.co.jp
Followup-To: comp.std.internat
Organization: Yokogawa Electric Corporation, Tokyo, Japan
Lines: 34
In-reply-to: erik@srava.sra.co.jp's message of 12 Apr 91 06:16:14 GMT

In article <1130@sranha.sra.co.jp> erik@srava.sra.co.jp
   (Erik M. van der Poel) writes:

 =}Also, you refer to "the JIS standard". This is rather misleading,
 =}since several implementations use *two* JIS standards, namely JIS X
 =}0208 (Kanji, etc) and the right-hand part of JIS X 0201 (`half-sized'
 =}Katakana, etc).

   Actually 3 popular codesets are JIS standard 0201, 0208, and 0212.
JIS X 0212 is a set of additional kanzi.

 =}Perhaps we're getting confused because we are looking at different
 =}documents.
 =} [...]
 =}He refers to codesets 1, 2 and 3 (i.e. not only 0208
 =}Kanji, etc).

   Yes, I'm looking at the documentation from various software packages
which use the UJIS encoding. They refer to four code sets:
   G0:	ASCII
   G1:	KANZI	(JIS X 0208)
   G2:	HANKAKU	(JIS X 0201)
   G3:	GAIZI
All four code sets are 16 bits wide.

 =}According to this paper, UJIS is not a 2 byte code. It is an encoding
 =}in which characters require 1, 2 or 3 bytes each. I.e. it is an mb
 =}code, definitely not a wc code.

   I hate to disagree, but all of the implementations I have seen which
use a mb encoding refer to the Japanese EUC as EUC and the wc encodings
refer to it as UJIS (except of course HP which refers to both as UJIS).

Al