Xref: utzoo news.software.b:5185 news.misc:5021 Path: utzoo!utstat!news-server.csri.toronto.edu!math.lsa.umich.edu!zaphod.mps.ohio-state.edu!usc!cs.utexas.edu!uunet!mcsun!isgate!krafla!frisk From: frisk@rhi.hi.is (Fridrik Skulason) Newsgroups: news.software.b,news.misc Subject: Re: Time for 8 bit news, isn't it?????. Summary: Yes - it sure is. Message-ID: <1857@krafla.rhi.hi.is> Date: 22 Jul 90 11:52:57 GMT References: <1990Jul13.022224.25441@lth.se> <3119.269d97ea@mccall.com> <777@hades.ausonics.oz.au> <15688@bfmny0.BFM.COM> <+7Y$AV&@rpi.edu> <1990Jul21.091529.29557@lth.se> Reply-To: frisk@rhi.hi.is (Fridrik Skulason) Followup-To: news.software.b Organization: University of Iceland (RHI) Lines: 100 Keywords: Well - some of us have 8-bits news already - I am for example using an 8-bit 'rn' right now. The program only required a few minor modifications to work properly. The reason we went to 8-bit news and E-mail is quite simple - our alphabet contains 10 charactes not found in standard ASCII. Of course I can only post 8-bit articles to our local newsgroups - the rest of the world is still only 7-bit :-( I fully agree that we need an 8-bit news system (as well as 8-bit E-mail), as this would make life a lot easier for those of us not using English. Modifying the news software to permit the transmission of 8-bit data is trivial - the real problem is the charcter set issue. I don't know if the readers of this group are familiar with a similar discussion regarding automatic translation between character sets in the Kermit program. The conclusions reached there seem to apply to the 8-bit News/E-mail discussion as well, though. Some possible solutions: (1) Each machine posts articles using the user's character set of choice. To indicate which character set is used, a new field is added to the header. examples: Character-set: CP 870 Character-set: ISO 8859/4 This is easy to implement, but has one serious drawback - all machines are required to be able to handle all possible character sets. (2) On every machine the article is translated into one of the ISO 8859/x series of character sets. 8859/1 would probably be most used, as it covers most of the languages of Western Europe. 8859/2, 8859/3, 8859/4 etc. would solve the needs of those using Greek, various Eastern European languages and (I think) Hebrew and Arabic. This would not solve the problem of those using a 16-bit character set. Also, I am not sure if Esperanto is included in any of the ISO 8859/x standards. (3) All text is transmitted according to the ISO 10646 standard. This has one advantage compared to (2) - it allows the transmission of documents containing 16-bit characters, as well as documents containing characters from more than one of the 8859/x standards. For example, one could send a message with the first part in Russian and the second part in Greek. My opinion is that (3) is more of a long-term goal - for 95 % of users of Usenet, (2) is all that is needed. But what changes would (2) require ? Change #1: Any ASCII computer on Usenet must accept 8-bit news and E-mail, and be able to forward articles without changes (in other words - don't strip the eight bit !!!) This is the only change required from the "English-only" ASCII-sites, where no 8-bit articles would originate or be read. Change #2: Any computer on Usenet using an extended version of ASCII (CP 437, ISO 8859/x etc) must translate all postings to one of the 8859/x charcter sets and indicate (in the header) which one is used. This change would be required from European/Non-English using users. Change #3: Any computer not using ASCII, but rather EBDIC (or something else), must translate all postings to one of the 8859/x character sets, instead of just translating to ASCII. Change #4: Any computer must accept postings in one of the 8859/x character sets and be able to translate them to the character set used by each user. Problem #1: If the local character set is not able to represent all the charactes in the original posting, they must be represented as well as possible. For example - a 7-bit computer receiving a text containing accented wovels might be expected just to drop the accent marks. Problem #2: Different users - even on the same machine - have different capabilities to display 8-bit text. For example, in Scandinavia it is common for terminals to use a 7-bit character set, where some of the characters (for example { [ ] } |) have been replaced by non-ASCII characters. Other users in the same countries have fully 8-bit terminals (for example PCs running an terminal emulator). The computer must store incoming articles as they arrive and the news/E-mail software must be updated to display them according to the capabilities of each terminal, as indicated by an environment variable. So - what now ? Is there any interest in creating a "working group" to attack the problem ? Any of the authors of rn, nn, elm or other news/e-mail software out there ? We are of course willing to share our modifications to the programs, and with a bit of work we should be able to have 8-bit news/email running in a few months. So - any volunteers ? -- Fridrik Skulason University of Iceland | Technical Editor of the Virus Bulletin (UK) | Reserved for future expansion E-Mail: frisk@rhi.hi.is Fax: 354-1-28801 |