Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!clyde.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!wuarchive!uunet!mcsun!ukc!slxsys!jclark!jjc
From: jjc@jclark.UUCP (James Clark)
Newsgroups: comp.text
Subject: Re: International (8 bit clean) troff proposal
Summary: groff does it already
Message-ID: <JJC.90Dec31140544@jclark.jclark.UUCP>
Date: 31 Dec 90 14:05:44 GMT
References: <1990Dec27.155046.14520@cbnewsl.att.com>
Sender: jjc@jclark.uucp (James Clark)
Organization: None, London, England
Lines: 72
In-Reply-To: npn@cbnewsl.att.com's message of 27 Dec 90 15:50:46 GMT

In article <1990Dec27.155046.14520@cbnewsl.att.com> npn@cbnewsl.att.com (nils-peter.nelson) writes:

   The consensus appears to be:
   1. Allow all DWB components to read 8-bit characters as defined
   by ISO 8859-1, a.k.a Latin-1.  The editing and preparation of
   such documents is the province of 8-bit terminals, 8-bit editors,
   and not our concern.  This requires that we remove all &177's.

groff already does this.

   2. Default behavior for troff should be "8-bit in, 8-bit out".
   The postprocessors will be rewritten to take this into account.

groff already does this.

   In addition, we should allow a "-7b" option to force troff
   output to be in the ASCII (ISO 646, 7 bit) subset.  This would permit
   mailing of ditroff output to the part of North America that
   hasn't caught on to ISO 8859.

I'm unconvinced by this.  What's wrong with using uuencode?  In any
case, if you want to send a document to somebody, it would seem to me
to be better to send either the ditroff input file or the
postprocessor output (since the ditroff output is tailored to a
particular device anyway).

   3. Recognize two-character 7-bit escapes so that people who
   don't have 8-bit terminals can still create documents with
   the extra characters,  [KSA] have proposed a reasonable standard
   convention which could serve as both input and output for troff.
   (e.g., \(oa for 'aring') but there are other proposals we
   will look at as well.

groff already does this.  It uses the two-character names described in
[KSA].  It would be a pity if DWB adopted an incompatible scheme.

   4. Reserve \C'string' and \N'number' for the truly odd characters
   that don't have a more convenient representation.

groff takes this approach.

   5. Hyphenation may present insurmountable problems; we'll see
   if anyone else (e.g. Knuth) has solved them.  Worst case,
   however, is that we'll hyphenate badly, and you'll have to
   turn it off.

I believe groff has a good solution to the hyphenation problem.
Hyphenation works in terms of hyphenation codes. Initially, the
letters `a' to `z' have `a' to `z' as their hyphenation codes, and `A'
to `Z' have `a' to `z'. There's a request that allows you to specify
the hyphenation code for any normal or special character; for example,

  .hcode \(^a a

would give `\(^a' (the name for `a' with a circumflex accent) a
hyphenation code of `a'.  Groff uses the same hyphenation algorithm
that TeX does (invented by Frank Liang): the hyphenation process is
controlled by a set of hyphenation patterns; letters in the patterns
are interpreted as hyphenation codes.  By supplying an appropriate
file of patterns and set of `hcode' requests, it should be possible to
make groff correctly hyphenate languages other than English.

   We will probably package this as DWB 3.2, which will be an
   "incremental" upgrade to DWB 3.1 (this means a minor fee for
   those who are DWB 3.1 licensees). Some of the work has already
   been completed, so the package should be ready around May 1991.

These features are in the currently released version of groff (0.6).

James Clark
jjc@jclark.uucp
jjc@ai.mit.edu