Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!mips!wildcat!anoosh From: anoosh@mips.COM (Anoosh Hosseini) Newsgroups: news.software.b Subject: Re: New USENET header: Language Keywords: OSI Message-ID: <44547@mips.mips.COM> Date: 5 Jan 91 07:41:35 GMT References: <1990Dec29.093002.10739@lth.se> <1991Jan4.082150.5895@ugle.unit.no> Sender: news@mips.COM Lines: 48 It may be worthwhile to step back, and look at the global picture. What is the mission? Internationalized News readers? OK. Lets assume this is the goal. In my opinion, we may want to look at the problem at 3 layers. The bottom layer is pure data, here we all agree that we need to move to an environment where 8 bit data can be created, and sent smoothly across all systems. This will allow any type of encoding, including multi-byte encodings. The next layer is transportation and management of articles. This already exists and the only thing worth mentioning is that we agree that all header format/information remain in ASCII English. The top layer is presentation and the area needing most rework. So far we have stayed with the ASCII display terminal as the common denominator. In this new environment, articles will no longer just be in French, German, or Italian, but some really exotic ones which require much more sophisticated support. So two trains of thought come into mind, one for the existing ASCII News readers to tolerate articles which may have non-ASCII message bodies, and standard methodologies to describe encodings used and resources needed to post/display multi-lingual articles. The new ASCII News reader will need to determine from the additional header field, if encodings used within the body is one it does not support. In such a case, it may wish to skip over the article. On the other hand a News reader supporting ISO 10646 encoding does not mean it can display every character set encapsulated within the standard. It would be nice to to have a complete solution, but that may be a ways off. Due to the research and background required to implement interfaces for some of these languages, most internationalization efforts have been localize. That is we may have a Japanese interface, but it may not do Hebrew or Arabic and vice versa. So we will need notation to specify subsets of standards. ISO 10646 encompasses many other ISO standards such as the 8859-X series. The latter are single byte (8 bit) codes and may be all that is needed for those who will at most use 2 languages (english/X). Just specifying 8859-X may be easier since the presentation layer of that particular News reader will not display any other character sets. In any case, anyone hoping to get any attention will need to encode in formats that people are reading in. :-) We have found that Xrn to have a good abstraction model, allowing easy foreign language support. The displaying of message body is passed to an X11 entity (text widget) which is responsible for all screen management and editing. The interface is basically a string, and the text widget can hide all the encoding, and display management, leaving the Xrn News reader basically intact. -anoosh --