Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!hellgate.utah.edu!caen!zaphod.mps.ohio-state.edu!think.com!linus!linus!linus!mbunix!eachus From: eachus@aries.mitre.org (Robert I. Eachus) Newsgroups: comp.sys.amiga.advocacy Subject: Re: Announcement--new "Unicode" standard Message-ID: Date: 25 Feb 91 19:28:53 GMT References: <39545@cup.portal.com> Sender: news@linus.mitre.org (News Service) Organization: The Mitre Corp., Bedford, MA. Lines: 63 In-Reply-To: Classic_-_Concepts@cup.portal.com's message of 24 Feb 91 06:55:13 GMT Nntp-Posting-Host: aries.mitre.org As some one who has to deal with international standards, and has had to look at the existing and proposed ISO standards, let me add a few facts to this discussion... Seven-bit ASCII (yes, 7) is the American version of the ISO-646 character set. There are ten characters in the ISO-646 set know as "national use" characters which can be defined differently by different national standards organizations. (These include the {ASCII} characters []$ etc. Over thirty of these national sets have been defined. ISO 2022 is a standard for combining two active 7-bit (actually 95 character) sets in a standard 8-bit format, with control characters to switch active sets. The control character sets are currently defined in ISO 6429. ISO 2022 also allows two-byte character sets such as Japanese to be embedded in a one-byte stream. The various one-byte character sets to use with ISO 2022 are defined in ISO 8859. All have the ANSI assignments in the lower half, and combinations of national charater sets in the upper page. These include: Part 1 Latin-1 Western Europe (except Iceland) Part 2 Latin-2 Eastern Europe Part 3 Latin-3 Southern Europe Part 4 Latin-4 Northern Europe Part 5 Latin/Cyrillic Part 6 Latin/Arabic Part 7 Latin/Greek Part 8 Latin/Hebrew Part 9 Latin-5 Western Europe (variation) The most commonly used of these is Latin-1 which corresponds to the labels in FED (if you set high to 255 :-). Now we get to the big stuff. There is currently a draft ISO standard 10646, which includes every known character from every language in the world (with LOTS of room for expansion). It is known as MOCS (for multi-octet character set) and each unique character (or variant such as capitialized, bold, or underlined) has a unique 32 bit designation. ISO 10646 can be represented as streams of one, two, three, or four octets using control characters, and there is a two-octet subset which contains all of the characters from the ISO 8859 sets, a lot of other small character sets, and a (9025 charater) page each of Japanese, Chinese, and Korean. This 16-bit set set, with escapes, looks likely to become the standard multi-lingual interchange format. So the Amiga world is "up" to the current 8859 standards, as well as supporting ISO 646 style national sets, and could theoretically support full 10646 (if you had enough memory :-). The Amiga doesn't currently support 2022 style character set mixing as far as I know (that's all right, almost no one else does either). Individual programs such as Notepad and some DTP programs support mixed fonts, but I don't think any of them conform to 2022. -- Robert I. Eachus Our troops will have the best possible support in the entire world. And they will not be asked to fight with one hand tied behind their back. President George Bush, January 16, 1991