Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!sdd.hp.com!decwrl!sun-barr!newstop!sun!cairo!tut From: tut@cairo.Sun.COM (Bill "Bill" Tuthill) Newsgroups: comp.text Subject: Re: SGML question Keywords: SGML, ambiguity Message-ID: <141873@sun.Eng.Sun.COM> Date: 5 Sep 90 21:09:10 GMT References: <555@helios.prosys.se> <146@thor.UUCP> <582@helios.prosys.se> <583@helios.prosys.se> Sender: news@sun.Eng.Sun.COM Lines: 32 In article <583@helios.prosys.se>, ath@prosys.se (Anders Thulin) writes: > > > SGML only provides text portability, which ASCII does more elegantly. > > ASCII isn't much of a help if I want to insert a Swedish { in the > text. (That '{', of course, is an 'a' with an umlaut accent). I probably should have said ISO 8859-1 (also known as ISO Latin-1), not ASCII, but I didn't think many people would know what that is. ISO Latin-1 is an 8-bit codeset identical to ASCII in the lower half, but extended into the upper half with accent marks and characters required in western Europe. Anders could produce any Scandinavian character using this extended code set. However, ISO Latin-1 doesn't solve the character encoding problem outside western Europe. There are ISO standards for eastern Europe (ISO 8859-2), Greece (ISO Greek), and Russia (ISO Cyrillic). There are also standards for Japan (JLS) and elsewhere in Asia. But these code sets are not mutually compatible; they are not interchangeable. I believe the ultimate answer is Unicode, a 16-bit code set that includes all known languages of the world in a single, interchangeable code set. Developed by Joe Becker and others at Xerox, Apple, and elsewhere, Unicode represents a tremendous leap forward. The main reason 16 bits is sufficient is that Chinese, Japanese and Korean pictographs have been combined so as to be complete and correctly ordered, though not necessarily contiguous. The main drawback to Unicode is that files will be twice as big. But being able to exchange data without shifting and conversion is a huge advantage. Space has even been left in the Unicode address space for ancient writing systems such as hieroglyphics and cuneiform.