Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!ames!ncar!ico!rcd From: rcd@ico.isc.com (Dick Dunn) Newsgroups: comp.text Subject: Re: International character set requirements needed Message-ID: <1990Dec20.012516.23623@ico.isc.com> Date: 20 Dec 90 01:25:16 GMT References: <1990Dec17.210354.1626@cbnewsl.att.com> <7625@castle.ed.ac.uk> Organization: Interactive Systems Corporation, Boulder, CO Lines: 45 yfcw14@castle.ed.ac.uk (K P Donnelly) writes: > It sounds to me as if what people are asking for is for troff to stop > stripping the eighth bit off characters in the input file, but instead > to pass them to the output file just like (7-bit) ASCII characters. It's not at all that simple. Troff has to know about the characters--it needs to be able to find them in its width tables and know whether the characters have ascenders and/or descenders (for sb/st/ct number regs). There's also an issue of whether troff should produce 8-bit codes on its output--there are some good arguments that it should not. The matter of 7-bit data paths is rather more complicated (and clumsy) than the single issue of a parity bit that Donnelly mentions. There are some methods of data interchange, such as most email systems, that are inherently 7-bit. It would be nice if we could just banish them, but compatibility is an albatross. The issue of inventing alternate representations, such as \(ao for "a ring" goes beyond the issue of simple 8-bit transparency. There are many more characters needed than can be represented in an 8-bit code set. Certainly one wants a conventional 8-bit set (such as Latin 1) for convenience, but more characters are needed even for European usage. It is useful to have a canonical representation in terms of 7-bit codes even if it's not the most commonly used. > The Scandinavians have up til now used "national versions" of ASCII in > which characters like { } ~ | are replaced by national characters like > a-ring... These are not ASCII. They are national versions of ISO 646. If you like, you could think of ASCII as a "national version" of ISO 646 used in the USA. 646 provides a few codes which are reserved for national characters; ASCII provides a particular assignment to those codes. The Scandinavian conventions are simply different assignments. > ...The Germans use in computing the alternative system of > placing an 'e' after the vowel instead of an umlaut sign above it... This alternative representation far predates computer usage, although it is certainly a convenient solution. Note also that scharfes ess turns into "ss". -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...Mr. Natural says, "Use the right tool for the job."