Path: utzoo!attcan!uunet!ogicse!ucsd!ucbvax!ICS.UCI.EDU!Stef
From: Stef@ICS.UCI.EDU (Einar Stefferud)
Newsgroups: comp.protocols.iso
Subject: Re: Character sets: ISO 6937 vs ISO 8859
Message-ID: <9813.657849751@nma.com>
Date: 6 Nov 90 00:02:31 GMT
References: <183*eskovgaa@uvcw.UVic.ca>
Sender: daemon@ucbvax.BERKELEY.EDU
Reply-To: Stef@ics.uci.edu
Organization: The Internet
Lines: 66

Since no one has hit Steve's question on the head yet, I will take a
shot at it.  

8859 was designed to facilitate Data Processing, and thus it is limited
to only 8 bit codes so as to avoid the pain of data processing on mixed
length character codes.  

6937 is more "transmission" oriented, with escape codes to signal
semantic shifts for subsequent characters.  6937 is favored by various
communication oriented processor vendors.  

I believe that XEROX uses 6937 quite effectively to support many
languages and character sets for their very much internationally
oriented document publishing systems.  

FTAM supported 8859 because of the data processing orientation of the
people involved with making the implementors agreement profiles.  X.400
has an obvious tilt toward documents rather than business records.  

Hope this helps at the meta understanding level.  The conflict between
8859 and 6937 is thus deep and unresolvable, though there are some
incomplete ways to map some very useful parts of 6937 onto 8859.  I
expect that all 8859 characters have 6937 equivalents, but this is only
a guess on my part.  

I have no knowledge of 10464, though I expect that it is intended to
somehow resolve the problems between 8859 and 6937.  

The character set question is a very big mess, and getting worse as the
effort to close on something common for the world takes root.  There are
3 main camps.  

North America, where we have little problem with just using ASCII, and
we wish the rest of the world would settle the question without making
life too complicated for our users who only have ASCII keyboards.  

Europe, where there are many alphabets and lots of accents and umlauts.
EWOS and others in EU are becoming deeply involved in this mess.  

Asia, where there are 3 main KANJI alphabets which are very difficult to
meld into some kind of single "alphabet".  Japanese KANJI characters are
strictly limited, and Katagana characters are used as modifiers to
extend the limited set.  Chinese KANJI is not so limited, with new
characters being invented over time, and with no "Katagana analog" to
use for extension.  I expect that Korean is more like Chinese, but I am
not even slightly expert in this.  Are there any other idiogram
alphabets?  

Anyway, the overall problems will have to be resolved among those
countries that have real problems with any loss of the right to use any
of their normally used characters as we move to electronic media.
Although us ASCII folk may think to look askance at all this character
set confusion, I think that we should at least offer our sympathies to
those with the real problems, while we try to keep things from getting
too complicated.  

I sort of shudder at being required to enter Katagana or KANJI or
umlauts and accents into ORAddresses, now that X.400 allows T.61
characters on ORAddresses.  I wonder how it can be done with my present
systems and keyboards?  

I have seen how the Japanese have modified EMACS to input and display
Katagana and KANJI.  Rather ingenious it is, and a real testimonial to
the power to EMACS.  

Best...\Stef