Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!ames!ncar!ico!rcd
From: rcd@ico.isc.com (Dick Dunn)
Newsgroups: comp.text
Subject: Re: International character set requirements needed
Message-ID: <1990Dec20.012516.23623@ico.isc.com>
Date: 20 Dec 90 01:25:16 GMT
References: <1990Dec17.210354.1626@cbnewsl.att.com> <7625@castle.ed.ac.uk>
Organization: Interactive Systems Corporation, Boulder, CO
Lines: 45

yfcw14@castle.ed.ac.uk (K P Donnelly) writes:

> It sounds to me as if what people are asking for is for troff to stop
> stripping the eighth bit off characters in the input file, but instead
> to pass them to the output file just like (7-bit) ASCII characters.

It's not at all that simple.  Troff has to know about the characters--it
needs to be able to find them in its width tables and know whether the
characters have ascenders and/or descenders (for sb/st/ct number regs).

There's also an issue of whether troff should produce 8-bit codes on
its output--there are some good arguments that it should not.  The matter
of 7-bit data paths is rather more complicated (and clumsy) than the single
issue of a parity bit that Donnelly mentions.  There are some methods of
data interchange, such as most email systems, that are inherently 7-bit.
It would be nice if we could just banish them, but compatibility is an
albatross.

The issue of inventing alternate representations, such as \(ao for "a ring"
goes beyond the issue of simple 8-bit transparency.  There are many more
characters needed than can be represented in an 8-bit code set.  Certainly
one wants a conventional 8-bit set (such as Latin 1) for convenience, but
more characters are needed even for European usage.  It is useful to have a
canonical representation in terms of 7-bit codes even if it's not the most
commonly used.

> The Scandinavians have up til now used "national versions" of ASCII in
> which characters like { } ~ | are replaced by national characters like
> a-ring...

These are not ASCII.  They are national versions of ISO 646.  If you like,
you could think of ASCII as a "national version" of ISO 646 used in the
USA.  646 provides a few codes which are reserved for national characters;
ASCII provides a particular assignment to those codes.  The Scandinavian
conventions are simply different assignments.

> ...The Germans use in computing the alternative system of
> placing an 'e' after the vowel instead of an umlaut sign above it...

This alternative representation far predates computer usage, although it is
certainly a convenient solution.  Note also that scharfes ess turns into
"ss".
-- 
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...Mr. Natural says, "Use the right tool for the job."