Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!usc!cs.utexas.edu!uunet!intercon!amanda@intercon.uu.net From: amanda@intercon.uu.net (Amanda Walker) Newsgroups: comp.mail.misc Subject: Re: 8-bit mail Message-ID: <24-Jun-89.210351@192.41.214.2> Date: 25 Jun 89 00:49:47 GMT References: <742@maxim.erbe.se> <759@isaak.UUCP> <127@hafro.is> Sender: news@intercon.UUCP Reply-To: amanda@intercon.uu.net (Amanda Walker) Organization: InterCon Systems Corporation, Sterling, VA Lines: 43 This seems to be the year of the international character set :-). A large proportion of our customer base is located in western Europe, and so we've gotten pretty familiar with operating in environments that are not limited to 7-bit U.S. ASCII. Unfortunately, there are a lot of standards to choose from :-), even if you stick to one from the ISO. If and when mail and news paths become 8-bit transparent (which I think would be a good idea), the situation will improve, as long as everyone cooperates. It sounds like a lot of the European UNIX community has standardized on ISO 8859/1, which is a step forward from ISO 646 (since it greatly widens the geographical area served by a single character set), but it still only puts the problem off for a while, and is only really useful for most of western Europe. Eastern Europe, parts of the Mediterranean, and the Pacific Rim countries are still left high and dry (to name a few). They don't have much presence in the global E-mail networks now, but it will only increase. Even in western Europe, character sets are still a problem. There are an awful lot of people out there still using the DEC Multinational Character Set, which is similar to but not the same as ISO 8859/1. There are a lot of people using National Replacement Character Sets as well, although these are starting to go away as time goes on. One of the biggest problems I have in writing code for MUAs and NUAs (News User Agents :-)) is determining what character set a given message is using. One thing I would really like to see is for MUA's to start using the Content-Type: field (or at least X-Content-Type:) in RFC 822 messages. This way the MUA can have a set of common standards it knows about, and can translate to whatever the user wants without lots of fancy footwork. Also, for some representations, such as ISO 2022 or subsets thereof, you can even send things transparently through 7-bit channels as long as they don't filter out the ESC character. ISO 8859/1 is just the start. Eventually, I hope the ISO finishes their multibyte character set standard (10646?), but who know when that will happen... -- Amanda Walker InterCon Systems Corporation -- amanda@intercon.uu.net | ...!uunet!intercon!amanda