Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!uunet!mcsun!ukc!edcastle!yfcw14 From: yfcw14@castle.ed.ac.uk (K P Donnelly) Newsgroups: comp.text Subject: Re: International character set requirements needed Keywords: dwb, troff, postscript Message-ID: <7625@castle.ed.ac.uk> Date: 19 Dec 90 10:54:07 GMT References: <1990Dec17.210354.1626@cbnewsl.att.com> Organization: Edinburgh University Computer Services Lines: 49 I have come in on this discussion from the outside, but it sounds as if you have misunderstood the requests. It sounds to me as if what people are asking for is for troff to stop stripping the eighth bit off characters in the input file, but instead to pass them to the output file just like (7-bit) ASCII characters. There is no need to invent new (7-bit) ASCII representations of non ASCII characters, such as \ao for a-ring. Such representations may be desirable for other purposes, but that is a separate issue. Anyone with a VT220 or VT320 terminal can input a-ring using the three character sequence a * the character gets hexadecimal code E5 in compliance with the ISO standard for western European languages. You see it on the screen and edit it just like any other character using any editor (such as the version 3.10 of microEmacs) which doesn't strip the eighth bit. However, it gets very frustrating if the software which gets between you and the laser printer insists for no good reason on stripping the eighth bit and turning the a-ring into an 'e' (ASCII code 65 hex). There is lots of such software around, especially on Unix. I think it is something to do with the eighth bit having been used for parity check in the past, so the software thought it was safest to filter it out. Nowadays, with cleaner communications lines, the eighth bit is hardly ever used for parity check - it wasn't a very good system anyway - and software packages which in the past have stripped the eighth bit are one by one changing their policies - witness Kermit 3.0, TeX 3.0, microEmacs 3.10. The Scandinavians have up til now used "national versions" of ASCII in which characters like { } ~ | are replaced by national characters like a-ring. This often makes their names look weird in mail signatures, and must cause them a lot of trouble when programming in languages like C and Pascal. The Germans use in computing the alternative system of placing an 'e' after the vowel instead of an umlaut sign above it. The French have such a variety of accented characters that in computing (mail messages and so on) they usually give up and leave out the accents. Sometimes they use devices like puting an apostrophe before or after the vowel to indicate an acute accent. On the Gaelic language bulletin board in which I participate, we always use a slash, '/', after the vowel to indicate an acute accent. The Icelanders have many more non-ASCII characters in their language than other Scandinavians, including some unique ones such as 'eth' and 'thorn' where you can't just "leave out the accent", so they have for long been ahead of the world in using 8-bit text in their computing. Kevin Donnelly