Path: utzoo!attcan!uunet!fernwood!apple!uokmax!munnari.oz.au!samsung!zaphod.mps.ohio-state.edu!uwm.edu!bionet!agate!ucbvax!edinburgh.ac.uk!K.P.Donnelly
From: K.P.Donnelly@edinburgh.ac.uk
Newsgroups: comp.protocols.iso
Subject: Re: Character sets: ISO 6937 vs ISO 8859
Message-ID: <05.Nov.90.19:39:02.gmt.320277@EMAS-A>
Date: 5 Nov 90 19:39:02 GMT
References: <1411.657811134@UK.AC.UCL.CS>
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 54

ISO 6937 and ISO 8859 are both extensions of ASCII (or ISO 646) to 8 bits.
Both of them avoid not only the 32 control characters of ISO 646 (columns
0 and 1) but also their 8-bit equivalents (columns 8 and 9), so as to avoid
possible transmission or other difficulties.

The most important difference is that ISO 6937 has "floating diacritics"
- characters of "zero width" representing accents, so that accented characters
are represented by two bytes, one for the unaccented character and one for
the accent.  This means that it can accommodate many more accented characters
within 8 bits than can ISO 8859.  In fact it copes with almost all languages
using a Latin based alphabet.  

However, it also means that most existing software, such as text editors,
will not cope with ISO 6937, whereas most software needs little or no
modification to work with ISO 8859.  Probably this is why ISO 6937,
although it came earlier than ISO 8859, has never really been adopted,
whereas ISO 8859 is becoming very widely used.

ISO 8859, because of the limited number of characters which it gets into
8 bits, has to be split into several parts.  ISO 8859-1 covers nearly all
Western European languages, which includes a lot of languages with economic
clout.  ISO 8859-2 covers Eastern European languages with a Latin based
alphabet such as Czech and Polish.  ISO 8859-2 and 8859-3 mop up some of
the gaps.  ISO 8859-5 is for languages like Russian with a Cyrillic alphabet,
8859-6 is for Arabic, 8859-7 is for Greek and 8859-8 is for Hebrew.
ISO 8859-9 is a late addition; it adds to ISO 8859-1 the characters needed for
Turkish, at the expense of Icelandic, which has far fewer speakers than Turkish
but which got included ISO 8859-1 because the Icelanders got into 8-bit
computing at an early stage and also because some of the characters are used
in Old English.  I don't know whether ISO 6937 has any additional parts for
languages such as Russian or Arabic with a non Latin alphabet.

ISO 6937 is a development from Teletex.  ISO 8859-1 is a development of the
DEC multinational character set.  Various manufacturors extended ASCII to
8 bits in various ways (e.g. IBM-PC character set; HP Roman 8 character set
used on Laserjet II laser printers), but the DEC multinational character set
has a far more logical layout of characters than the others.  ISO 8859-1 is
used on DEC VT320 terminals, and terminal emulations such as MS-Kermit 3.0.
The reason that X.400(1988) refers to ISO 6937 whereas X-Windows makes use of
ISO 8859 may be the association between CCITT and Teletex and the association
between DEC and the development of X-Windows, or it may just be that 
X.400(1988) was developed earlier on.

It is now regarded as wasteful having anything like as many as 64 character
positions reserved for control characters, and proposals have been made to 
extend ISO 8859-1 to cover more languages.  Alternatively, it is possible
that ISO 6937 might make something of a comeback within the context of
structured documents.  Or both ideas might be leapfrogged by two-byte or
multi-byte character sets, with file compression for storage.

I am no expert and some of the above information may be wrong.  If so, I would
be glad of corrections.

   Kevin Donnelly