Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!mcvax!kth!sunic!maxim!prc From: prc@erbe.se (Robert Claeson) Newsgroups: comp.mail.misc Subject: Re: 8-bit mail Summary: ISO 8859/2 for Eastern Europe and ISO 8859/4 for Scandinavia Keywords: 8-bit character sets, electronic mail Message-ID: <746@maxim.erbe.se> Date: 28 Jun 89 19:50:02 GMT References: <742@maxim.erbe.se> <759@isaak.UUCP> <127@hafro.is> <24-Jun-89.210351@192.41.214.2> Reply-To: rclaeson@erbe.se (Robert Claeson) Organization: International Extremists for Preservation of Cultural Differences Lines: 60 In article <24-Jun-89.210351@192.41.214.2> amanda@intercon.uu.net (Amanda Walker) writes: >It sounds like a lot of the European UNIX community has >standardized on ISO 8859/1, which is a step forward from ISO 646 (since >it greatly widens the geographical area served by a single character set), >but it still only puts the problem off for a while, and is only really >useful for most of western Europe. Eastern Europe, parts of the >Mediterranean, and the Pacific Rim countries are still left high and >dry (to name a few). They don't have much presence in the global E-mail >networks now, but it will only increase. You forgot the native's character sets in Scandinavia, used for languages such as Lappish. But ISO has solutions for these problems (as long as people cooperates)... >Even in western Europe, character sets are still a problem. There are an >awful lot of people out there still using the DEC Multinational Character >Set, which is similar to but not the same as ISO 8859/1. And I believe that there is an ANSI 8-bit character set as well, of which Microsoft uses a subset similar to the DEC Multinational character set. And then there are people with HP terminals who uses HP's Roman-8 character set which is quite dissimilar to the ISO 8-bit character sets, not to mention all IBM users out there with their PC's, AT's, PS/2's and even RT's that uses IBM's very own ASCII-superset 8-bit character set (and I don't mean EBCDIC). >One of the biggest problems I have in writing code for MUAs and NUAs (News >User Agents :-)) is determining what character set a given message is using. And how to invoke that character set on the terminal currently in use, or how to map it into one of the character sets the terminal has, or how to display the message if the temrinal doesn't support the character set the message is written in. >One thing I would really like to see is for MUA's to start using the >Content-Type: field (or at least X-Content-Type:) in RFC 822 messages. >This way the MUA can have a set of common standards it knows about, and can >translate to whatever the user wants without lots of fancy footwork. ..... Yes, that would be nice. Any volounteers? >ISO 8859/1 is just the start. Eventually, I hope the ISO finishes their >multibyte character set standard (10646?), but who know when that will >happen... In the meantime, use the 8-bit character sets defined in the ISO 8859 standard (I believe that there are 9 of them now) with the code shift techniques described in some other ISO standard whose number I forgot. This way, different ISO-standardized character sets can be used in the same message to, say, write in English, mention some German names and quote a few Lappish phrases. I believe that AT&T UNIX SVR4 will use this technique as part of its native language support system. Now, is there anyone who knows of a terminal that supports all the ISO 8859 character sets and the code shift sequences? :-) -- Robert Claeson E-mail: rclaeson@erbe.se ERBE DATA AB