Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!sun!cairo!tut
From: tut@cairo.Sun.COM (Bill "Bill" Tuthill)
Newsgroups: comp.text
Subject: Re: SGML question
Keywords: SGML, ambiguity
Message-ID: <142145@sun.Eng.Sun.COM>
Date: 11 Sep 90 00:44:27 GMT
References: <582@helios.prosys.se> <583@helios.prosys.se> <141873@sun.Eng.Sun.COM> <1990Sep10.170717.7993@unx.sas.com>
Sender: news@sun.Eng.Sun.COM
Lines: 22

bts@unx.sas.com (Brian T. Schellenberger) writes:
> |
> |The main drawback to Unicode is that files will be twice as big.  But
> |being able to exchange data without shifting and conversion is a huge
> |advantage.  Space has even been left in the Unicode address space for
> |ancient writing systems such as hieroglyphics and cuneiform.
> 
> This is be no means necessary.  Even the "large" versions of Kanji and
> such-like only have 6000 or so characters.  Allowing for overlap, I would
> immensely surprised if you couldn't take care of all the ideographic living
> languages (mostly Chinese, Japanese, and Korean) in 15,000 characters, tops.

Wait a minute, do you mean it isn't necessary to combine Asian language
character sets?  There are over 6000 Kanji characters in Japan, about the
same number in Korea and China (PRC) and around 14000 characters in Taiwan.
That adds up to at least 32000.  Combinatory efforts done by the Unicode
people have reduced that total to around 20000.

Or do you mean 16 bits aren't necessary, 15 are enough?  That means there
would only be 12767 empty slots after covering the Asian languages, which
is almost certainly not enough.  I really don't think the world needs yet
another shift encoding algorithm, anyway.