Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!math.lsa.umich.edu!caen!umich!terminator!pisa.ifs.umich.edu!rees From: rees@pisa.ifs.umich.edu (Jim Rees) Newsgroups: news.software.b Subject: Re: New USENET header: Language Message-ID: <4ecb2c8b.1bc5b@pisa.ifs.umich.edu> Date: 24 Dec 90 20:43:36 GMT References: <1990Dec22.081718.2109@looking.on.ca> <4ec122f7.1bc5b@pisa.ifs.umich.edu> <1990Dec22.231146.17316@watmath.waterloo.edu> <1990Dec23.030622.12129@looking.on.ca> Sender: usenet@terminator.cc.umich.edu (usenet news) Reply-To: rees@citi.umich.edu (Jim Rees) Organization: University of Michigan IFS Project Lines: 27 In article <1990Dec23.030622.12129@looking.on.ca>, brad@looking.on.ca (Brad Templeton) writes: It is my understanding that 8 (or more) bit representations of the non-roman character set are not fully standardized. Is this wrong? In such an event, you need two headers -- one for the natural language, and another for the encoding format. Some are standardized, some are not. I think rather than have the header specify the encoding, we should pick one encoding for each language and make that the standard on Usenet. I would hope we can pick the technically superior encoding. For example, some European encodings substitute non-ASCII symbols (umlauts, e.g.) for '|', '{', '}', etc. That's a bad idea. We should adopt something like Latin-1 that adds the new symbols to the old set. Most of the encodings I've seen put ASCII in the bottom half of the 8-bit set. This lets you mix ASCII with a foriegn language without having to change modes. Note that encoding format can be a broad format, as it is really for computer consumption. Valid forms could be ASCII (with underlining) -- the default, various extended-Ascii formats, Katanji etc. but also binary, image, rich text etc. There are all kinds of nice multi-media things we could do, but for now I think we should keep it simple.