Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!execu!sequoia!uudell!bigtex!texsun!newstop!exodus!cairo.Eng.Sun.COM!tut From: tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) Newsgroups: comp.text Subject: Re: International character set requirements needed Keywords: 8-bit data, mail Message-ID: <5204@exodus.Eng.Sun.COM> Date: 3 Jan 91 22:51:24 GMT References: <1990Dec20.012516.23623@ico.isc.com> Sender: news@exodus.Eng.Sun.COM Lines: 28 keld@login.dkuug.dk (Keld J|rn Simonsen) writes: > > Is UNICODE a true subset of ISO 10646? > Is there a well defined relation between ISO 10646 encoding and UNICODE? ISO 10646 is still in draft form. Both questions are impossible to answer until 10646 gets finalized. Disclaimer: I'm not an expert in this area. However, extrapolating from what I know, it appears that Unicode could be considered a 16-bit implementation of 10646. The ISO 10646 draft standard appears to permit 16-bit implementations of any subset thereof, for use in process code or communication. It just so happens that Unicode covers all Asian characters enumerated by existing national standards, plus characters from languages that the 10646 draft hasn't even thought about. So it may be a subset, but a largely complete subset. Lee Collins writes: > Notice that 10646 would require 93,816 separate codes to cover existing > [Chinese/Japanese/Korean] standards. Han Unification allows Unicode to > cover the same standards with only 18,739 unique characters. Ken Whistler writes: > Unicode 1.0 also includes the following scripts omitted from DIS 10646: > Ethiopian, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, > Malayalam, Sinhalese, and Lao. There have been attempts to convert Unicode to 10646 and back again, I believe with mostly good results. Of course, some data may be lost in the translation.