Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!rice!uw-beaver!fluke!inc
From: inc@tc.fluke.COM (Gary Benson)
Newsgroups: comp.text.sgml
Subject: Re: looking for more information
Message-ID: <1990Dec11.214726.8463@tc.fluke.COM>
Date: 11 Dec 90 21:47:26 GMT
References: <200@tivoli.UUCP> <1990Nov28.105230.10365@tc.fluke.COM> <215@tivoli.UUCP>
Organization: John Fluke Mfg. Co., Inc., Everett, WA
Lines: 131

In article <215@tivoli.UUCP> lark@tivoli.UUCP (Lar Kaufman) writes:

>In case I have misled anyone, I hasten to admit that I am very much a 
>student of SGML, not a master.  I have _never_actually_used_ SGML in a 
>product, so my knowledge is only theoretical.  I am only now in a position 
>to begin working with SGML concepts and proto-SGML software.  I agree with 
>Gary's proposal, and I hope someone with a practical knowledge will accept 
>the task of maintaining a FAQ file.
>
>Gary also mentions tools written in perl.  I would love to see people 
>volunteering code and techniques for implementing SGML solutions.  I know 
>that others have written programs for converting structural information 
>to/from SGML using various languages (such as Icon).  Where are they?  
>Has anyone considered setting up an FTP site for SGML tools?
>
>A final comment:  we should remember to distinguish between SGML, the 
>standard, and various software products that implement it.  It can be 
>confusing to mix these.  For example, when I say that you can imbed 
>chapters in an SGML document, I do not imply any knowledge of how a 
>specific SGML product does it (or doesn't do it).


Woops, I didn't mean to set you up for guru-hood, Lar! It's just that your
posting was well-written and informative without being esoteric to the point
of meaninglessness. I hope this newsgroup can be a place for a wide-spectrum
discussion of SGML, but so far, it has seemed weighted toward theory, and I
found your posting to be a refreshing breath of reality.

As to your idea about people posting code and techniques, I can say this --
we have several man-years of programming in our quasi-SGML autocoding
programs, and I'm sure I'd be in big trouble if I disseminated those
programs. However, our techniques are rather interesting (to us, at least),
and I was surprised to see no response to my query if others are using our
techniques.

Long ago, back when we typeset all of Fluke's technical manuals, a decision
was made in the Publications Department to attempt to keep the writing
function as separate as possible from the production function. We defined
production as encompassing page design, preparation of files for
typesetting, typesetting itself, layout, and of course printing, binding,
and so on.

There have been two very interesting results from that decision:

    1. While the industry as a whole has moved to "desk-top publishing", we
       find ourselves without many peers to discuss methods. We still have
       our staff typing in raw text, having rejected the "Mac on every desk"
       approach.

    2. We are in an excellent position to take advantage of new software
       tools because we have a lot of experience with implied markup
       techniques.

In our approach, the writer's file has an absolute minimum of explicit
instructions or codes. We have long used the string ---n at the end of lines
to indicate heading levels. This is basically the only "coding" our writers
do in files. Everything else is recognized by context or through regular
expression pattern matching, something that perl is extremely adept at.

We use a perl program to scan the file and determine what objects are
present. Figure titles are identified by the following string, appearing on
a line by itself:

			Figure n-n. arbitrary text title

When our coding program comes across that string, there is only one possible
generic code to send to the output file: <figure>. We are toying with the
idea of having the title end with a "higher level generic code" like the
heading level indicators. This would serve as a cue from writer to
gencoding program indicating the desired size of the illustration. For
example, "Figure 3-3. Arbitrary Text Title/1" might indicate a full-page
illustration, while changing the number to 2, 3, or 4 would indicate half,
third and quarter pages respectively.

Lists are indented objects beginning with a number or letter, followed by a
dot. When the program is confronted with a list environment, it compares the
current indent to the former one and the result determines when to send the
<end> tag for proper nesting. For bullet lists, we use the letter o with no
following dot.

As each line is processed, a subroutine scans it for any "special
characters" and sends the required string to the gencode file. We like +/-
to appear as a plus sign above a minus. Regular expressions look for degrees
symbols and Greek letters like mu and omega among others. For example, the
string 9oF means 9 degrees F, while 13 uF means 13 microFarads.

A major concern has been that reviewers should not be asked to try to make
their way through a text loaded with coding. We've found that we get higher
quality review remarks when the review copy looks similar to the expected
final page. Which is why we have pre-printout filters that convert lines
ending ---n to boldface, and if we do incorporate the "Figure Title/n" idea
we will probably not print the code even in review copies, instead
converting the number to line or form feeds.

Our perl program currently recognizes and generates generic codes for:

    * Section headings

    * Notes, Cautions, and Warnings

    * Textual headings up to 4th order (we tell writers if they need to go
      any higher than 4th order headings, they are probably writing funny).

    * Alpha, numeric, and two types of bullet lists at 4 indent levels

    * Figure and Table Titles

...and of course, everything else is just running text :-)

Many of our manuals need special treatement for a variety of things --
special fonts, in-text keycap art, special formats, so we by no means have
technical publication figured out down to a non-event, but we are getting there!
Generic coding and implied markup are powerful approaches to the traditional
problems in publishing (especially publishing of structured documents as
opposed to books, magazines, and so on).

As I asked before, I'd be very interested in hearing from others who are
using similar methods. Or other perl users! We had our first program written
for us about 2 1/2 years ago, and it is still cranking along, even through
two dozen patch levels.


Gary Benson
Supervisor, Publication Services
John Fluke Mfg. Co. Inc.


-- 
Gary Benson    -=[ S M I L E R ]=-   -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_-

Go jump in a goddam volcano, you fucking cave newt!   -greg Nowak