Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!usc!apple!sun-barr!decwrl!ucbvax!hplabs!hpfcso!hpfcdc!donn From: donn@hpfcdc.HP.COM (Donn Terry) Newsgroups: comp.std.internat Subject: Re: ASCII for national characters Message-ID: <9300002@hpfcdc.HP.COM> Date: 22 Nov 89 18:43:42 GMT References: <472@enea.se> Organization: HP Ft. Collins, Co. Lines: 47 There are actually a bunch of candidate character sets. ISO646: 7-bit, kinda like ASCII, one country at a time. Each country that uses it has it's own national variant in the "changable" characters. ISO8859: 8-bit, using 2 96 (or 95, depending on what you do with DEL) planes. Suitable for English plus choose 1 of Western Europe Eastern (Latin) Europe Cyrllic Arabic (Others; all "small" phonetic alphabets) I don't remember if Eastern Europe includes Turkish or whether it's another case. ISO2022: Lays on top of 646 or 8859 (or others) and defines language shifts. Blows away any presumption that length of string in characters == length in bytes == space used in displaying text. Various Asian national standards for the "Han" ("Chinese") character set plus national character sets for Japan and Korea. No unification of these sets. ISO10646: 32-bit everything code. Treats the various Han character sets as distinct character sets for each national usage, but unifies the Latin characters into a single set. Variable length coding possible to reduce space. Can degenerate to (something close to) 8859. UNICODE: this isn't a standard but is proposed. Unifies the Han character sets in the same way as the Latin ones (but with obviously a much bigger payback because of the size). Fixed length 16 bits. This fixes the length in characters vs. length in bytes issue. (The issue of length in display space is inherently harder because characters do vary in width in natural usage in many phonetic alphabets, as well as in the ideographic ones. See Arabic and Hindi where the constant-width usage is considered "pretty awful", albeit readable. (Even in English, good typesetting is not constant width.)) CCITT T2xx (I don't have the exact number). Another player that I just recently found out about and don't know anything about in detail. This is "teletext", I'm told. There are certainly more. Donn Terry HP Ft. Collins