Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!asuvax!noao!rstevens From: rstevens@noao.edu (Rich Stevens) Newsgroups: comp.text Subject: Re: Quality of computer typesetting? Summary: my experiences (long) Message-ID: <1991Mar15.134229.23183@noao.edu> Date: 15 Mar 91 13:42:29 GMT References: <1991Mar14.215651.13961@dartvax.dartmouth.edu> Organization: National Optical Astronomy Observatories, Tucson AZ Lines: 107 Well, I'm not a typesetter or a book designer, just a picky author. You might want to take a look at my recent book, "UNIX Network Programming" (Prentice Hall, 1990). (It's been out for 14 months now, so most technical libraries should have it.) I used troff, DWB 2.0 to be exact, and had a lot of problems that I had to work around. Here's the saga. First, troff's hyphenation algorithm isn't perfect. I ended up having to make a hyphenation exception list with about 150 words in it. But, vanilla troff only allows an exception list of 128 bytes, so I had to take the sources and increase the list size. One advantage of having the sources. Finding all the hyphenation errors was a pain. What I had to do was write some shell scripts that ran troff with the -a flag, find all the hyphenated words and look them up in my own list to see if I'd ever encountered that work hyphenated before. If not I printed the word then by hand looked it up in the dictionary and entered the word into my list with its hyphenation points. I looked around for machine-readable dictionaries that included hyphenation points but couldn't find any. (I'm still not aware of any that could help me with this specific problem.) Also, troff doesn't handle words in the exception list correctly if the word contains a ligature. So words like "specification" that it doesn't hyphenate correctly can't be put in the exception list. You have to find them and insert the hyphenation character (\%) at all the right points. Another script. Next problem is that troff hyphenates too many lines in a row, so I had to write another script that looked for more than 2 lines in a row that were hyphenated. I often went in by hand and rewrote a few words to get around these. Next problem was actual page layout. My book had *lots* of figures and listings, some of which had to stay entirely on one page. Just as I was finishing the book the article by Kernighan and Van Wyk appeared on their troff post-processor that did this. Unfortunately the software wasn't generally available. (It's still not really available, in my mind, since it takes $20,000 to get the source and no one is selling binaries. Also, I'm not positive that the page maker software is in DWB 3.1?) What I ended up doing was going through the entire text (700+ pages) and forcing page breaks where I wanted them. You do page 1, get it right, then page 2, then page 3, ... Fortunately I had a nice terminal to do this on (an AT&T 630) but it was a long process. I swore I wouldn't do that again. A few times I ended up rewording something just to get the page break where I wanted it. The next problem is the placement of dashes with current fonts. I don't like the placement of hyphens, en-dashes, or em-dashes with most fonts. Hyphens are set at the x-height for lower case characters, which is fine for hyphenation at the end of the line, but looks awful (IMHO) in words such as MS-DOS. (Fortunately only a few MS-DOSs in a Unix book!) Also, with the font I used (PostScript Times Roman) the en-dash and em-dash touched most letters on either side of the dash. I didn't like that, or the fact that an en-dash between two digits is really an en-dash between two upper case characters (so it should be a little higher). I ended up writing scripts that "fixed" all this by moving certain characters around before troff got a hold of them. Parentheses are another similar problem, as most fonts have parens set for lower case characters. My final gripe about troff based systems is the lack of any "grammar software". I had diction (the old, old, BSD-distributed version) but have to wade through so many false hits with it that I don't use it much. I have some bad writing habits that I want to find automatically. My solution to this is yet-another script that looks for certain phrases that I've accumulated over time, and prints the offending line. Things such as starting a sentence with "However", using words like "essentially", etc. A nice grammar checking package for Unix would be nice to have. (My gut feeling is they might give me more false hits than diction, but it would be nice to at least have the option of these types of packages.) Would I use troff again -- yes, but I'm now using groff. It has TeX's hyphenation algorithm which saves me a lot of problems, and has an option to limit the number of lines in a row that it hyphenates. The font problems I still have to deal with, but I've done it before. Page layout is still a problem. I may end up writing a simple version of the Kernighan & Van Wyk package (I don't need multi-column formatting, for example) if I have the time before the next book is finished. I'm not a TeX user mainly because I've been using troff for almost 14 years and am familiar with it. Also, I don't think TeX by itself generates good books. Knuth's books look great, but I've seen two books by another author that were made using TeX that look pretty bad. I think 90% of the final appearance depends on the author, not the system that was used. I end up doing so much pre-processing and post-processing of troff's input and output, that I don't think using a package like Frame is my answer either. (It's funny, when I picked up the first Unix book I'd seen that specifically said it was made with Frame and not troff, the second page I looked at had a blatant hyphenation error. Made my day.) The best looking troff-generated books that I've seen are Van Wyk's "Data Structures and C Programs" and another one by Ravi Sethi (that I can't remember the name of). I think both were made using the K & VW page maker software. The 10th Edition Unix manuals from Murray Hill also look very nice. Some of the worst troff books I've seen have actually been printed by a laser printer at 300 dpi. Yuck. It only costs about $5/page to typeset a book that's already in PostScript, so there's really no excuse for laser printed books today. Rich Stevens (rstevens@noao.edu)