Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!mcvax!kth!sunic!maxim!prc
From: prc@erbe.se (Robert Claeson)
Newsgroups: comp.mail.misc
Subject: Re: 8-bit mail
Summary: ISO 8859/2 for Eastern Europe and ISO 8859/4 for Scandinavia
Keywords: 8-bit character sets, electronic mail
Message-ID: <746@maxim.erbe.se>
Date: 28 Jun 89 19:50:02 GMT
References: <742@maxim.erbe.se> <759@isaak.UUCP> <127@hafro.is> <24-Jun-89.210351@192.41.214.2>
Reply-To: rclaeson@erbe.se (Robert Claeson)
Organization: International Extremists for Preservation of Cultural Differences
Lines: 60

In article <24-Jun-89.210351@192.41.214.2> amanda@intercon.uu.net (Amanda Walker) writes:

>It sounds like a lot of the European UNIX community has
>standardized on ISO 8859/1, which is a step forward from ISO 646 (since
>it greatly widens the geographical area served by a single character set),
>but it still only puts the problem off for a while, and is only really
>useful for most of western Europe.  Eastern Europe, parts of the
>Mediterranean, and the Pacific Rim countries are still left high and
>dry (to name a few).  They don't have much presence in the global E-mail
>networks now, but it will only increase.

You forgot the native's character sets in Scandinavia, used for languages
such as Lappish. But ISO has solutions for these problems (as long as people
cooperates)...

>Even in western Europe, character sets are still a problem.  There are an
>awful lot of people out there still using the DEC Multinational Character
>Set, which is similar to but not the same as ISO 8859/1.

And I believe that there is an ANSI 8-bit character set as well, of which
Microsoft uses a subset similar to the DEC Multinational character set.
And then there are people with HP terminals who uses HP's Roman-8 character
set which is quite dissimilar to the ISO 8-bit character sets, not to mention
all IBM users out there with their PC's, AT's, PS/2's and even RT's that uses
IBM's very own ASCII-superset 8-bit character set (and I don't mean EBCDIC).

>One of the biggest problems I have in writing code for MUAs and NUAs (News
>User Agents :-)) is determining what character set a given message is using.

And how to invoke that character set  on the terminal currently in use, or
how to map it into one of the character sets the terminal has, or how to
display the message if the temrinal doesn't support the character set
the message is written in.

>One thing I would really like to see is for MUA's to start using the
>Content-Type: field (or at least X-Content-Type:) in RFC 822 messages.
>This way the MUA can have a set of common standards it knows about, and can
>translate to whatever the user wants without lots of fancy footwork.
.....

Yes, that would be nice. Any volounteers?

>ISO 8859/1 is just the start.  Eventually, I hope the ISO finishes their
>multibyte character set standard (10646?), but who know when that will
>happen...

In the meantime, use the 8-bit character sets defined in the ISO 8859
standard (I believe that there are 9 of them now) with the code shift
techniques described in some other ISO standard whose number I forgot.
This way, different ISO-standardized character sets can be used in the
same message to, say, write in English, mention some German names and
quote a few Lappish phrases. I believe that AT&T UNIX SVR4 will use
this technique as part of its native language support system.

Now, is there anyone who knows of a terminal that supports all the ISO
8859 character sets and the code shift sequences?  :-)

-- 
          Robert Claeson      E-mail: rclaeson@erbe.se
	  ERBE DATA AB