Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!clyde.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!wuarchive!uunet!mcsun!ukc!slxsys!jclark!jjc From: jjc@jclark.UUCP (James Clark) Newsgroups: comp.text Subject: Re: International (8 bit clean) troff proposal Summary: groff does it already Message-ID: Date: 31 Dec 90 14:05:44 GMT References: <1990Dec27.155046.14520@cbnewsl.att.com> Sender: jjc@jclark.uucp (James Clark) Organization: None, London, England Lines: 72 In-Reply-To: npn@cbnewsl.att.com's message of 27 Dec 90 15:50:46 GMT In article <1990Dec27.155046.14520@cbnewsl.att.com> npn@cbnewsl.att.com (nils-peter.nelson) writes: The consensus appears to be: 1. Allow all DWB components to read 8-bit characters as defined by ISO 8859-1, a.k.a Latin-1. The editing and preparation of such documents is the province of 8-bit terminals, 8-bit editors, and not our concern. This requires that we remove all &177's. groff already does this. 2. Default behavior for troff should be "8-bit in, 8-bit out". The postprocessors will be rewritten to take this into account. groff already does this. In addition, we should allow a "-7b" option to force troff output to be in the ASCII (ISO 646, 7 bit) subset. This would permit mailing of ditroff output to the part of North America that hasn't caught on to ISO 8859. I'm unconvinced by this. What's wrong with using uuencode? In any case, if you want to send a document to somebody, it would seem to me to be better to send either the ditroff input file or the postprocessor output (since the ditroff output is tailored to a particular device anyway). 3. Recognize two-character 7-bit escapes so that people who don't have 8-bit terminals can still create documents with the extra characters, [KSA] have proposed a reasonable standard convention which could serve as both input and output for troff. (e.g., \(oa for 'aring') but there are other proposals we will look at as well. groff already does this. It uses the two-character names described in [KSA]. It would be a pity if DWB adopted an incompatible scheme. 4. Reserve \C'string' and \N'number' for the truly odd characters that don't have a more convenient representation. groff takes this approach. 5. Hyphenation may present insurmountable problems; we'll see if anyone else (e.g. Knuth) has solved them. Worst case, however, is that we'll hyphenate badly, and you'll have to turn it off. I believe groff has a good solution to the hyphenation problem. Hyphenation works in terms of hyphenation codes. Initially, the letters `a' to `z' have `a' to `z' as their hyphenation codes, and `A' to `Z' have `a' to `z'. There's a request that allows you to specify the hyphenation code for any normal or special character; for example, .hcode \(^a a would give `\(^a' (the name for `a' with a circumflex accent) a hyphenation code of `a'. Groff uses the same hyphenation algorithm that TeX does (invented by Frank Liang): the hyphenation process is controlled by a set of hyphenation patterns; letters in the patterns are interpreted as hyphenation codes. By supplying an appropriate file of patterns and set of `hcode' requests, it should be possible to make groff correctly hyphenate languages other than English. We will probably package this as DWB 3.2, which will be an "incremental" upgrade to DWB 3.1 (this means a minor fee for those who are DWB 3.1 licensees). Some of the work has already been completed, so the package should be ready around May 1991. These features are in the currently released version of groff (0.6). James Clark jjc@jclark.uucp jjc@ai.mit.edu