Path: utzoo!telly!eci386!ecicrl!clewis From: clewis@ecicrl.UUCP (Chris Lewis) Newsgroups: comp.text Subject: Re: International character set requirements needed Message-ID: <1025@ecicrl.UUCP> Date: 21 Dec 90 08:17:49 GMT References: <1990Dec17.210354.1626@cbnewsl.att.com> <7625@castle.ed.ac.uk> <1990Dec20.012516.23623@ico.isc.com> Reply-To: clewis@ecicrl.UUCP (Chris Lewis) Organization: Elegant Communications Inc., Ottawa, Canada Lines: 108 In article <1990Dec20.012516.23623@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >yfcw14@castle.ed.ac.uk (K P Donnelly) writes: >> It sounds to me as if what people are asking for is for troff to stop >> stripping the eighth bit off characters in the input file, but instead >> to pass them to the output file just like (7-bit) ASCII characters. >It's not at all that simple. Troff has to know about the characters--it >needs to be able to find them in its width tables and know whether the >characters have ascenders and/or descenders (for sb/st/ct number regs). Donnelly is right though, that's primarily what people *do* want. (Note that 8-bit clean in ditroff doesn't imply that ditroff is passing the 8-bit characters directly to the printer - on the contrary, it only means that 8-bit characters can appear in the ditroff *input* file file, and 8-bit characters can be found (somehow) in the width tables. There's no particular reason that the ditroff format output file (troff(5)) actually contains 8-bit characters, for this file isn't really intended to be read, and extensions to the "c" directive could be altered to permit octal or some other 7-bit clean representation if necessary. [Refresher on ditroff guts: troff document -> ditroff -> ditroff intermediate -> filter -> printer ^ ^ | | +-------------+--------------------+ | width tables The width tables contain the width and kerning information that ditroff needs to know for character placement, and also contains the byte that the filter emits for each character (though, the filter doesn't have to use them). The ditroff intermediate is a displayable file with simple commands indicating character placement, font size, points etc.] It shouldn't be all that much harder for ditroff to permit 8-bit characters in the width tables. After all, it does permit octal sequences in the fourth field (the character the backend is to emit to generate the desired glyph). It would be nicest if the left most column (the character ditroff is searching for) could be 8-bit, but octal would probably serve in a pinch, permitting both would be even better (and would permit transmission of these files over 7-bit paths/editting via 7-bit vi's etc.) Psroff's analogous tables do permit this. T'would be especially nice now that the newest vi's are now 8-bit clean. And emacs is now as well. >There's also an issue of whether troff should produce 8-bit codes on >its output--there are some good arguments that it should not. The matter >of 7-bit data paths is rather more complicated (and clumsy) than the single >issue of a parity bit that Donnelly mentions. There are some methods of >data interchange, such as most email systems, that are inherently 7-bit. >It would be nice if we could just banish them, but compatibility is an >albatross. Since you're talking about troff generating 8-bit codes on its output, I'm not sure that this is a real issue, because the intermediate ditroff format output isn't really an interchange format. Regardless, what people want is the ability to jam 8-bit characters into the input of ditroff and have it do sane things, not necessarily their representation in the intermediate file. As it is now, the width tables *do* permit the filter to emit 8-bit characters to the printer - they *have* to. On that note, you might have trouble getting 8-bit characters to the printer, but that's the OS's/system administrator's/printer designer's fault. On the other hand, permitting ditroff to accept 8-bit characters on *input* may get people into trouble when they try to mail something through a 7-bit path. But it isn't all that difficult to solve, either by uuencoding (or similar) or having a program that converts the 8-bit characters to the \(xx convention (and vice-versa) (I'm in fact going to be implementing something like this in Psroff). People solve it all the time when shipping PC binaries. Requiring all of Europe to have to type those silly 4-character sequences when trying to edit documents in their own language when 8-bit is *easy* isn't a very nice thing to do to faithful customers. And it isn't *just* Europe. It's Canada too. (I'm a member of the CSA/Treasury Board Canadian Posix Working Group). Canada is also trying to encourage Latin-1 because of bilingual (English/French) requirements both in government and in the private sector. 7-bit ASCII is very nearly *only* the USA (most other English speaking countries are either tending towards Latin-1, or a different version of 646. 646 in all it's national variants only satisfies completely a minority of the Roman-alphabet countries). Lest one think that Canada is a minor addition to Western Europe in this context, one should remember that Canada is the US's biggest single trading partner by a substantial margin (considerably larger than Japan). The only market bigger than Canada is the EEC taken as a unit (aka Western Europe, aka the other Latin-1 countries). Markets, markets! Heck, if you're ever even thinking about Kanji, you really should satisfy the considerably larger group of customers that need Latin-1. >The issue of inventing alternate representations, such as \(ao for "a ring" >goes beyond the issue of simple 8-bit transparency. It *isn't* transparency, on the contrary. HOWEVER, having both the 8-bit input transparency as well as alternate representations that need only 7-bit would be a definate plus, so that people with 646 compliant terminals can do the same things that the newer Latin-1 ones can. And it ain't all that hard to do. Hell, if I can do it in psroff via CAT troff without source, AT&T should be able to do it with ditroff. -- Chris Lewis, Phone: (613) 832-0541 UUCP: uunet!utai!lsuc!ecicrl!clewis Moderator of the Ferret Mailing List (ferret-request@eci386) Psroff mailing list (psroff-request@eci386)