Path: utzoo!attcan!uunet!nih-csl!lhc!ncifcrf!haven!udel!wuarchive!mit-eddie!bloom-beacon!eru!hagbard!sunic!nuug!sigyn.idt.unit.no!ugle.unit.no!isolde!hta
From: Harald.alvestrand@elab-runit.sintef.no
Newsgroups: comp.protocols.iso
Subject: Re: Character sets: ISO 6937 vs ISO 8859
Keywords: floating accents, ISO 2022
Message-ID: <1990Nov6.114044.28490@ugle.unit.no>
Date: 6 Nov 90 11:40:44 GMT
References: <183*eskovgaa@uvcw.UVic.ca> <9813.657849751@nma.com>
Sender: news@ugle.unit.no
Reply-To: harald.alvestrand@elab-runit.sintef.no
Organization: ELAB-RUNIT, SINTEF, Norway
Lines: 34

ISO 6937 defines floating accents, that is, an A with an accent is represented
as "accent-sign A", 2 octets.
ISO 8859 defines a single sign "accented A".
ISO 6937 also lists the "supported combinations" of accents and characters, and
has a non-spacing underline, which means that you can underline anything.
That in turn means that an eight-character name can take 24 bytes of storage
if it is all underlined, accented characters. Makes things a bit problematic
for programmers of FORTRAN.
In total, ISO 6937 requires about 316 characters or character-accent
combinations
to be supported. That covers the needs of the Europeans that use Latin
alphabets.

The question of switching character sets belongs to ISO 2022, which defines
escape sequences for the purpose. That in turn refers the "international
registry
of character sets", which is maintained by somebody, I THINK it is ECMA, but
I do not remember this clearly.

BTW, ISO 8859 is really a collection of character sets, numbered from
ISO 8859-1
(the one the US people are pushing) to ISO 8859-9 (as of now). In all the sets,
the lower 128 positions are defined in the same way, but the higher positions
may have changes. I believe 8859-4 is suitable for the East European languages
(characters like C with an inverted circumflex accent are very important in
writing the languages of Chekoslovakia, for instance).
So, switching BETWEEN character sets is a requirement, at least until ISO 10646
is finalized (if ever). That one attempts to land every character in the world
inside one big 32-bit character set, with ISO 8859-1 as the first 256 bit
positions, leading to easy compression of 8859-1 text :-)
Any clarity added?

                 Harald Tveit Alvestrand
                 harald.alvestrand@elab-runit.sintef.no