Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!hellgate.utah.edu!caen!zaphod.mps.ohio-state.edu!think.com!linus!linus!linus!mbunix!eachus
From: eachus@aries.mitre.org (Robert I. Eachus)
Newsgroups: comp.sys.amiga.advocacy
Subject: Re: Announcement--new "Unicode" standard
Message-ID: <EACHUS.91Feb25132853@aries.mitre.org>
Date: 25 Feb 91 19:28:53 GMT
References: <39545@cup.portal.com>
Sender: news@linus.mitre.org (News Service)
Organization: The Mitre Corp., Bedford, MA.
Lines: 63
In-Reply-To: Classic_-_Concepts@cup.portal.com's message of 24 Feb 91 06:55:13 GMT
Nntp-Posting-Host: aries.mitre.org


     As some one who has to deal with international standards, and has
had to look at the existing and proposed ISO standards, let me add a
few facts to this discussion...

     Seven-bit ASCII (yes, 7) is the American version of the ISO-646
character set.  There are ten characters in the ISO-646 set know as
"national use" characters which can be defined differently by
different national standards organizations.  (These include the
{ASCII} characters []$ etc.  Over thirty of these national sets have
been defined.

     ISO 2022 is a standard for combining two active 7-bit (actually 95
character) sets in a standard 8-bit format, with control characters to
switch active sets.  The control character sets are currently defined
in ISO 6429.  ISO 2022 also allows two-byte character sets such as
Japanese to be embedded in a one-byte stream.

     The various one-byte character sets to use with ISO 2022 are
defined in ISO 8859.  All have the ANSI assignments in the lower half,
and combinations of national charater sets in the upper page.  These
include:
 
          Part 1   Latin-1   Western Europe (except Iceland)
          Part 2   Latin-2   Eastern Europe
          Part 3   Latin-3   Southern Europe
          Part 4   Latin-4   Northern Europe
          Part 5   Latin/Cyrillic
          Part 6   Latin/Arabic
          Part 7   Latin/Greek
          Part 8   Latin/Hebrew
          Part 9   Latin-5   Western Europe (variation)

     The most commonly used of these is Latin-1 which corresponds to
the labels in FED (if you set high to 255 :-).

     Now we get to the big stuff.  There is currently a draft ISO
standard 10646, which includes every known character from every
language in the world (with LOTS of room for expansion).  It is known
as MOCS (for multi-octet character set) and each unique character (or
variant such as capitialized, bold, or underlined) has a unique 32 bit
designation.  ISO 10646 can be represented as streams of one, two,
three, or four octets using control characters, and there is a
two-octet subset which contains all of the characters from the ISO
8859 sets, a lot of other small character sets, and a (9025 charater)
page each of Japanese, Chinese, and Korean.  This 16-bit set set, with
escapes, looks likely to become the standard multi-lingual interchange
format.

     So the Amiga world is "up" to the current 8859 standards, as well
as supporting ISO 646 style national sets, and could theoretically
support full 10646 (if you had enough memory :-).  The Amiga doesn't
currently support 2022 style character set mixing as far as I know
(that's all right, almost no one else does either).  Individual
programs such as Notepad and some DTP programs support mixed fonts,
but I don't think any of them conform to 2022.
--

					Robert I. Eachus

     Our troops will have the best possible support in the entire
world.  And they will not be asked to fight with one hand tied behind
their back.  President George Bush, January 16, 1991