Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!sun!cairo!tut From: tut@cairo.Sun.COM (Bill "Bill" Tuthill) Newsgroups: comp.text Subject: Re: SGML question Keywords: SGML, ambiguity Message-ID: <142145@sun.Eng.Sun.COM> Date: 11 Sep 90 00:44:27 GMT References: <582@helios.prosys.se> <583@helios.prosys.se> <141873@sun.Eng.Sun.COM> <1990Sep10.170717.7993@unx.sas.com> Sender: news@sun.Eng.Sun.COM Lines: 22 bts@unx.sas.com (Brian T. Schellenberger) writes: > | > |The main drawback to Unicode is that files will be twice as big. But > |being able to exchange data without shifting and conversion is a huge > |advantage. Space has even been left in the Unicode address space for > |ancient writing systems such as hieroglyphics and cuneiform. > > This is be no means necessary. Even the "large" versions of Kanji and > such-like only have 6000 or so characters. Allowing for overlap, I would > immensely surprised if you couldn't take care of all the ideographic living > languages (mostly Chinese, Japanese, and Korean) in 15,000 characters, tops. Wait a minute, do you mean it isn't necessary to combine Asian language character sets? There are over 6000 Kanji characters in Japan, about the same number in Korea and China (PRC) and around 14000 characters in Taiwan. That adds up to at least 32000. Combinatory efforts done by the Unicode people have reduced that total to around 20000. Or do you mean 16 bits aren't necessary, 15 are enough? That means there would only be 12767 empty slots after covering the Asian languages, which is almost certainly not enough. I really don't think the world needs yet another shift encoding algorithm, anyway.