Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!dkuug!dkuugin!keld From: keld@login.dkuug.dk (Keld J|rn Simonsen) Newsgroups: comp.std.c++ Subject: Re: ISO Latin 1? (was Re: design by committee) Message-ID: Date: 6 Dec 90 00:12:53 GMT References: <1016@zinn.MV.COM> <1990Nov23.211727.2802@zoo.toronto.edu> <1990Nov28.164154.5718@zoo.toronto.edu> Sender: news@slyrf.dkuug.dk Lines: 54 doconnor@titania.srg.UUCP (Dennis O'Connor x4982 room 6-230N) writes: >With respect to Japanese character sets : >There are four character sets used in Japan : > Kanji : 2000+ common characters, plus more uncommon ones. Most adults > can read and write Kanji. Every "character" stands for a complete > word or concept. > Hiragana (sp?) : about 150 characters, I think. Used to phonetically > spell out words that are native to the Japanese language. > Katakana : again, about 150 characters. Used to phonetically spell > out words that have been imported into Japanese from foriegn languages. The character sets I have seen has 86 katakana and 83 hiragana characters. But then I am talking about "encoded character sets" like 8859-1 etc, not the "character set repertoire" which is an abstract kind of guy. There are to my knowledge several encoded character sets in usage in Japan today, including: X0201: an 8 bit character set including almost all of ASCII (except backslash which is a Yen sign) and the katakana characters. X0208: a 16 (14) bit character set consisting of matematical characters, latin, hiragana, katakana, cyrillic and greek and some box drawing characters, then the large section of kanji characters which are ordered in two parts, the more common ones in pronounciation order (Japanese order!) and then the less frequent ones in radical/stroke order. Some 6500 kanji characters are included. X0212: A new 16 (14) bit character set with a lot of the latin, cyrillic, greek and kanji characters not in X0208. X0201, X0208 and X0212 are published by JIS - the Japanese standards institute, but it seems like the 16 bit sets are not widely implemented, maybe because that demands a quite big character set. Shift-JIS: 8/16 bits character set which is used on PCs. Normal ASCII characters are just in 8 bits, and some katakana characters are also in just 8 bits. Then some of the characters are used as escape sequences to provide about 3000 kanji characters. EUC: Enhanced UNIX Code: This is the above character sets and others encoded with ISO 2022 techniques. So it is possible to shift in and out between several character sets. Keld Simonsen