Path: utzoo!attcan!uunet!ogicse!ucsd!ucbvax!ICS.UCI.EDU!Stef From: Stef@ICS.UCI.EDU (Einar Stefferud) Newsgroups: comp.protocols.iso Subject: Re: Character sets: ISO 6937 vs ISO 8859 Message-ID: <9813.657849751@nma.com> Date: 6 Nov 90 00:02:31 GMT References: <183*eskovgaa@uvcw.UVic.ca> Sender: daemon@ucbvax.BERKELEY.EDU Reply-To: Stef@ics.uci.edu Organization: The Internet Lines: 66 Since no one has hit Steve's question on the head yet, I will take a shot at it. 8859 was designed to facilitate Data Processing, and thus it is limited to only 8 bit codes so as to avoid the pain of data processing on mixed length character codes. 6937 is more "transmission" oriented, with escape codes to signal semantic shifts for subsequent characters. 6937 is favored by various communication oriented processor vendors. I believe that XEROX uses 6937 quite effectively to support many languages and character sets for their very much internationally oriented document publishing systems. FTAM supported 8859 because of the data processing orientation of the people involved with making the implementors agreement profiles. X.400 has an obvious tilt toward documents rather than business records. Hope this helps at the meta understanding level. The conflict between 8859 and 6937 is thus deep and unresolvable, though there are some incomplete ways to map some very useful parts of 6937 onto 8859. I expect that all 8859 characters have 6937 equivalents, but this is only a guess on my part. I have no knowledge of 10464, though I expect that it is intended to somehow resolve the problems between 8859 and 6937. The character set question is a very big mess, and getting worse as the effort to close on something common for the world takes root. There are 3 main camps. North America, where we have little problem with just using ASCII, and we wish the rest of the world would settle the question without making life too complicated for our users who only have ASCII keyboards. Europe, where there are many alphabets and lots of accents and umlauts. EWOS and others in EU are becoming deeply involved in this mess. Asia, where there are 3 main KANJI alphabets which are very difficult to meld into some kind of single "alphabet". Japanese KANJI characters are strictly limited, and Katagana characters are used as modifiers to extend the limited set. Chinese KANJI is not so limited, with new characters being invented over time, and with no "Katagana analog" to use for extension. I expect that Korean is more like Chinese, but I am not even slightly expert in this. Are there any other idiogram alphabets? Anyway, the overall problems will have to be resolved among those countries that have real problems with any loss of the right to use any of their normally used characters as we move to electronic media. Although us ASCII folk may think to look askance at all this character set confusion, I think that we should at least offer our sympathies to those with the real problems, while we try to keep things from getting too complicated. I sort of shudder at being required to enter Katagana or KANJI or umlauts and accents into ORAddresses, now that X.400 allows T.61 characters on ORAddresses. I wonder how it can be done with my present systems and keyboards? I have seen how the Japanese have modified EMACS to input and display Katagana and KANJI. Rather ingenious it is, and a real testimonial to the power to EMACS. Best...\Stef