Path: utzoo!attcan!uunet!nih-csl!lhc!ncifcrf!haven!udel!wuarchive!mit-eddie!bloom-beacon!eru!hagbard!sunic!nuug!sigyn.idt.unit.no!ugle.unit.no!isolde!hta From: Harald.alvestrand@elab-runit.sintef.no Newsgroups: comp.protocols.iso Subject: Re: Character sets: ISO 6937 vs ISO 8859 Keywords: floating accents, ISO 2022 Message-ID: <1990Nov6.114044.28490@ugle.unit.no> Date: 6 Nov 90 11:40:44 GMT References: <183*eskovgaa@uvcw.UVic.ca> <9813.657849751@nma.com> Sender: news@ugle.unit.no Reply-To: harald.alvestrand@elab-runit.sintef.no Organization: ELAB-RUNIT, SINTEF, Norway Lines: 34 ISO 6937 defines floating accents, that is, an A with an accent is represented as "accent-sign A", 2 octets. ISO 8859 defines a single sign "accented A". ISO 6937 also lists the "supported combinations" of accents and characters, and has a non-spacing underline, which means that you can underline anything. That in turn means that an eight-character name can take 24 bytes of storage if it is all underlined, accented characters. Makes things a bit problematic for programmers of FORTRAN. In total, ISO 6937 requires about 316 characters or character-accent combinations to be supported. That covers the needs of the Europeans that use Latin alphabets. The question of switching character sets belongs to ISO 2022, which defines escape sequences for the purpose. That in turn refers the "international registry of character sets", which is maintained by somebody, I THINK it is ECMA, but I do not remember this clearly. BTW, ISO 8859 is really a collection of character sets, numbered from ISO 8859-1 (the one the US people are pushing) to ISO 8859-9 (as of now). In all the sets, the lower 128 positions are defined in the same way, but the higher positions may have changes. I believe 8859-4 is suitable for the East European languages (characters like C with an inverted circumflex accent are very important in writing the languages of Chekoslovakia, for instance). So, switching BETWEEN character sets is a requirement, at least until ISO 10646 is finalized (if ever). That one attempts to land every character in the world inside one big 32-bit character set, with ISO 8859-1 as the first 256 bit positions, leading to easy compression of 8859-1 text :-) Any clarity added? Harald Tveit Alvestrand harald.alvestrand@elab-runit.sintef.no