Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!math.lsa.umich.edu!caen!umich!terminator!pisa.ifs.umich.edu!rees
From: rees@pisa.ifs.umich.edu (Jim Rees)
Newsgroups: news.software.b
Subject: Re: New USENET header: Language
Message-ID: <4ecb2c8b.1bc5b@pisa.ifs.umich.edu>
Date: 24 Dec 90 20:43:36 GMT
References: <1990Dec22.081718.2109@looking.on.ca> <4ec122f7.1bc5b@pisa.ifs.umich.edu> <1990Dec22.231146.17316@watmath.waterloo.edu> <1990Dec23.030622.12129@looking.on.ca>
Sender: usenet@terminator.cc.umich.edu (usenet news)
Reply-To: rees@citi.umich.edu (Jim Rees)
Organization: University of Michigan IFS Project
Lines: 27

In article <1990Dec23.030622.12129@looking.on.ca>, brad@looking.on.ca (Brad Templeton) writes:

  It is my understanding that 8 (or more) bit representations of the non-roman
  character set are not fully standardized.  Is this wrong?
  
  In such an event, you need two headers -- one for the natural language, and
  another for the encoding format.

Some are standardized, some are not.  I think rather than have the header
specify the encoding, we should pick one encoding for each language and make
that the standard on Usenet.  I would hope we can pick the technically
superior encoding.  For example, some European encodings substitute
non-ASCII symbols (umlauts, e.g.) for '|', '{', '}', etc.  That's a bad
idea.  We should adopt something like Latin-1 that adds the new symbols to
the old set.

Most of the encodings I've seen put ASCII in the bottom half of the 8-bit set.
This lets you mix ASCII with a foriegn language without having to change
modes.

  Note that encoding format can be a broad format, as it is really for
  computer consumption.  Valid forms could be ASCII (with underlining) -- the
  default, various extended-Ascii formats, Katanji etc. but also binary,
  image, rich text etc.

There are all kinds of nice multi-media things we could do, but for now I
think we should keep it simple.