Path: utzoo!utgpu!watserv1!ria!ria.ccs.uwo.ca!yukngo From: yukngo@obelix.gaul.csd.uwo.ca (Cheung Yukngo) Newsgroups: comp.text Subject: Re: SGML question Message-ID: Date: 11 Sep 90 09:30:14 GMT References: <582@helios.prosys.se> <583@helios.prosys.se> <141873@sun.Eng.Sun.COM> <1990Sep10.170717.7993@unx.sas.com> <142145@sun.Eng.Sun.COM> Sender: news@ria.ccs.uwo.ca Organization: Dept. of C.S., University of Western Ontario Lines: 29 In-reply-to: tut@cairo.Sun.COM's message of 11 Sep 90 00:44:27 GMT In article <142145@sun.Eng.Sun.COM> tut@cairo.Sun.COM (Bill "Bill" Tuthill) writes: bts@unx.sas.com (Brian T. Schellenberger) writes: > | > |The main drawback to Unicode is that files will be twice as big. But > |being able to exchange data without shifting and conversion is a huge > |advantage. Space has even been left in the Unicode address space for > |ancient writing systems such as hieroglyphics and cuneiform. > > This is be no means necessary. Even the "large" versions of Kanji and > such-like only have 6000 or so characters. Allowing for overlap, I would > immensely surprised if you couldn't take care of all the ideographic living > languages (mostly Chinese, Japanese, and Korean) in 15,000 characters, tops. Wait a minute, do you mean it isn't necessary to combine Asian language character sets? There are over 6000 Kanji characters in Japan, about the same number in Korea and China (PRC) and around 14000 characters in Taiwan. That adds up to at least 32000. Combinatory efforts done by the Unicode people have reduced that total to around 20000. Well, I don't think 6000 Chinese characters is a reasonable amount. It is probably good enough to cover most of the Chinese surnames. According to ``An Introduction to Chinese, Japanese and Korean Computing'' by Huang and Huang, 74,000 is a reasonanle amount. Granted, most of the characters are not used. But then you don't purge a word just because it is not used in daily life---so the size of Oxford English Dictionary. I don't know anything about SGML. I hope SGML knows something about Asian Languages.