Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!decwrl!ucbvax!bloom-beacon!eru!luth!sunic!dkuug!dkuugin!keld From: keld@login.dkuug.dk (Keld J|rn Simonsen) Newsgroups: eunet.followup,comp.std.internat Subject: Re: Code Page Conversion Message-ID: Date: 10 Aug 90 11:28:54 GMT References: <1973@enea.se> Sender: news@slyrf.dkuug.dk Followup-To: eunet.followup Lines: 81 sommar@enea.se (Erland Sommarskog) writes: >Uwe Geuder (geuder@informatik.uni-stuttgart.de) writes: >>From Keld J|rn Simonsen: >> I use it in email, it is build into the sendmail we use here, >> and EUnet has decided to run this on an experimental basis >> on all the backbones of EUnet. >> >>What does this mean? When I get mail from Sweden, it's still in Swedish >>ASCII (is that SSCII??), which is horrible too read on (US) ASCII devices >>used in Germany (German 7-bit Code is never used here). If I run conv SE US >>on such files they get much prettier. So I can't imagine that any host in >>between has already done it. Or is there no "EUnet backbone" between Sweden >>and Germany? >I get a little anxious here, but I may misunderstand some things >here. I certainly don't want mail I send out to be automatically >transformed when they get out. Yes, I understand that occurrances >of ][\}{| are not nice to read, but it seems a risky business to >translate them straight off. If I use them in an non-Swedish mail, >I usually explain them. With a non-wanted transformation, that >would look a little stupid. (And how does the machine know that >I use an "[" as a dotted capital "A" and not as a left bracket?) >Wouldn't it be better, if this was done at receiver's end on request? Yes, I share Erland's concerns. You cannot just translate 7-bit [\] (these 7-bit values are defined as letters in both Swedish and Danish 7-bit) to ISO 8859-1 Swedish/Danish letters. What we do at dkuug.dk (the Danish Internet backbone) is transforming both 8-bit curly braces and Scandinavian letters to 7-bit [\]. The other way, from 7-bit Danish or Swedish to ASCII or some 8-bit code, we normally do not touch these codes. The conversion we do here are mostly for use on 8-bit machines, where some run ISO 8859-1 and some runs some IBM Codepage. Doing it at the receivers end: well the receiver needs to know what information is in there. This information must be generated on the senders side, who knows what the message is. >Another question: Through a mailing-list I have indirectly received >a list of two-character code stemming from Keld Simonsen. I don't >know whether it is this one we discuss, but I would assume so. I >must admit that I laid that one aside with the thought: "My God, >how unreadable and what an overkill!" I tend to think I missed >some points with its purpose. Could Keld or anyone else clarify? Yes, I have made a quite elaborate list of character names, which is being used for mail. It is designed for worldwide use, and the world is big. There is about 940 characters in there covering all 7 and 8-bit character sets I know of. It does not yet contain any Japanese nor Chinese character. The character names are primarily used to identify a character and to be able to registrate properties of these, such as membership of a character set or that it is a lower case character, and then the upper case character can be specified alongside. It does have some mnemonic value, eg a with dieresis (a-umlaut) is called "a:". How readable and beautiful this is can always be discussed, but there are some rules to it which are consistently applied. It is also been designed with short names of the characters to improve compactness and translation costs, and also to improve readability and writability. >And a final question: we are moving into an eight-bit world. Instead >of relying on old standards, why not aim to have Eunet work with ISO >8859/1 instead? (8859 is apparently already obsolete with the recent >changes in Eastern Europe, but that is another matter.) I am collaborating with a fellow countryman of yours, Dan Oscarsson from LTH, on using the new ISO 10646 character set for email. This character set has almost all characters in the world in a 32 bit compactable code set. No ISO 8859 is not outdated. ISO 8859-2 covers Eastern Europe, and ISO 859-5 covers Russia (Cyrillic). 8859 does not cover Japanese and other Eastern character sets, though. This was the reason we decided on ISO 10646. Keld Simonsen