Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!olivea!apple!agate!ucbvax!CS.CHALMERS.SE!bernerus From: bernerus@CS.CHALMERS.SE (Christer Bernerus) Newsgroups: comp.soft-sys.andrew Subject: Re: 8-bit characters, how to use ? Message-ID: Date: 9 Nov 90 09:49:04 GMT References: Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 39 Excerpts from mail: 8-Nov-90 Re: 8-bit characters, how t.. Craig_Everhart@transarc. (389) > Mail unformatting is done by the andrew/overhead/mail/lib/unscribe.c > module. Is it always obvious how to turn accented characters into > non-accented ones? I know some of the rules in German (what turns into > (e.g.) oe, ae, ue, ss), but what rules apply to other languages? > Swedish, for instance? > Certainly the unscribe.c module pre-dates any consideration of 8-bit > characters. > Craig Thanks for pointing out unscribe.c for me. I had a look at it but it doesn't seem trivial to enhance it the way I wanted. What I had in mind was to use the compchar character table which allows for "customary local replacements". Preferably using the ATKToASCII function in textaux/compchar.c, but it doesn't seem as if unscribe.c was a part of the object-oriented stuff in ATK, so I'm very unsure how to do it in a proper way. It can of course be done as a "hack", but I feel that's a bit dangerous if e.g the lib/compchar/comps format changes. Regarding the way conversions should be done, there are usually many ways of doing this, even within a country, institution, group etc. So the problem isn't trivial, especially not for a mail gateway which does the unformatting. E.g. mail from Sweden containing e, d, v and even | should probably be replaced with }, { | and u, but if the letter came from Germany, maybe the replacements should be (there's no e in germany) , ae, oe and ue respectively. Converting the other way round is definitely non-trivial, epecially if the latter replacements are used. In my opinion, the only thing that really helps for the (nearest) future is an 8-bit extension to RFC822 which would make it "legal" to write mailers which support 8 bit mail transparently. It doesn't solve the whole world's problems though. Chris.