Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!snorkelwacker.mit.edu!hsdndev!cmcl2!uupsi!sunic!ugle.unit.no!nuug!ifi.uio.no!enag From: erik@naggum.no (Erik Naggum) Newsgroups: comp.text.sgml Subject: Re: DTD for UNIX Manual Page Message-ID: <82-7640-006-X/0010@naggum.no> Date: 29 Jun 91 23:29:08 GMT References: <1991Jun27.144331.17689@iscnvx.uucp> Sender: enag@ifi.uio.no (Erik Naggum) Reply-To: Erik Naggum Organization: Naggum Software, Oslo, Norway Lines: 264 Nntp-Posting-Host: gyda.ifi.uio.no In-Reply-To: emv@msen.com's message of 28 Jun 91 05: 23:49 GMT Originator: enag@gyda.ifi.uio.no Ed Vielmetti writes: | | structure of man page for mythical program. | | [example deleted] Ah, but this is only an instance. I'd imagine the DTD for this instance to look like: ]> which I think is horrible. This is also an excellent example of the "structure vs contents" debate. Ed has given us a list of elements which reflect the contents of the man-page, and has assumed that line breaks are significant in some elements and not in others, that lead- ing blank sequences are ignored, that empty lines are meaningful, that in the examples, user input and program output are intuitively differ- entiated, that in the see-also element, manual sections are indicated by a parenthesized number, etc, etc, all of which is probably useful for a richtext rendering of a manual page, but is _very_ hard to make useful for typeset manual pages. (I do think that we need one format for both viewable and printable man pages.) From a structural point of view, we see that a manual-page consists of a title (with its own particular semantics), and several sections. Sections consist of a header, and some contents. The contents can be made up of several types of tokens (command name, options, arguments, to name a few), and maybe examples. Examples should differentiate between system prompts and output and user input on the other, if it's desirable to print the user input in blue, in italicized characters, underlined, or whatever. The contents will also need to contain lists of several types, highlighted phrases which are not any of the token types listed above, and may need to use special words in small capitals (e.g. "UNIX"). Use of constant-width fonts for examples and certain keywords have been useful in the past. I'm looking at the man pages stored at this system (SunOS 4.1, I think), and they're pretty hairy, with lots of details. Lar Kaufman told me that the OSF has done something on manual pages, but I haven't had time to find out what they've done. The following is an attempt to make a useful manual page DTD. THIS IS AN EXAMPLE, ONLY. DON'T USE IT WITHOUT CONTACTING ME. No warranties, express or implied. All rights to original material reserved. Permission to use as an example is granted to readers of comp.text.sgml. This material contains quoted material from Sun Microsystems, Inc. SunOS&tm; Reference Manual, copyright 1987, 1988 by Sun Microsystems, Inc. Material used for instructional purposes, only. ISBN number is valid, but not assignable. Don't refer to it. %docs; ]> and an example man-page (cal(1)) (from SunOS 4.1 distribution): NAME

cal -- display a calendar SYNOPSIS cal [ [ month ] year ] DESCRIPTION

cal displays a calendar for the specified year. If a month is also specified, a calendar for that month only is displayed. If neither is specified, a calendar for the present month is printed.

year can be between 1 and 9999. Be aware that `"cal 78"' refers to the early Christian era, not the 20th century. Also, the year is always considered to start in January, even though this is historically naive.

month is a number between 1 and 12.

The calendar produced is that for England and her colonies.

Try September 1752. A simple description of this document type: A manual entry consists of several sections, and has an entry and a section attributed associated with it, and may have revision date and OS release attributes as well. The section starts with a digit, but may be followed by any digit or letter. With the section, entry is used for indexing and referencing purposes. Both of these could probably be subsumed by the file name, but there's no way to specify or retrieve the file name of an entity in SGML, so I left them as attributes, probably machine generatable. Sections have headings and a type, which reflect the intrinsic type of a section, such as name, synopsis, description, options, references (see also), files, bugs, author, etc, and may be redundant, since the content of the header element is supposed to reflect this type. May be useful for alternate versions. (SunOS has both BSD and SysV options, and need to distinguish them, but they're both option-type sections.) The heading is always the first element after the section start-tag, and its tags may be omitted. Following the heading are several paragraphs of text. Paragraphs are more content-oriented than the overall structure, in that several types of elements are specifically named, such as option list, (other) list, example, citation, in addition to general para- graph content, highlighted phrases, command names, argument names, option names, and user-specifyable other content, such as file names. An option list contains a list of options, with optional argument and regular paragraph content. Other lists may have a tag associated with them, useful only if the type attribute is specified as tag. Ordered and unordered lists are avaiable, the latter being the default. Examples contains alternating user and system input and output, where the exact names of these elements are user-defined. The citation element may contain data and highlighted phrases, whereas the other paragraph content just contains data. The highlighted phrase has a rendition attribute associated with it, whose values are user specifyable, and defaults to bold, italic, bold italic, small caps and roman via their initials. The otherh, otherp, x.con, and rend parameter entities may be changed to reflect special needs in some manual entries. One might have another predefined section type by invoking the DTD with the following syntax: ]> ... AUTHOR

Ken Olson Using Ed's example command, my suggestion would look like this: ]> NAME

verbify -- turn nouns into verbs SYNOPSIS

verbify [ -s stemlist ] noun

verbify -v [ -s stemlist ] verb DESCRIPTION verbify verbifies nouns into a state of submission. ENVIRONMENT

Options can be specified in the VERBOPTS environment parameter. Where conflicts exist, options on the command line take precedence. EXAMPLES

<sys>% <user>verbify verification <sys>verify <sys>verificationalize <sys>% <user>verbify -v negate <sys>negation <sys>negative FILES /usr/lib/verbify/stemming stem rules /usr/dict/words system dictionary LIMITATIONS

Words can be no longer than 1024 characters. SEE-ALSO prepositionalize, gerundify AUTHORS Naomi Valentine DIAGNOSTICS -- not a noun: unchanged When the input word is not a noun (i.e., a participle), it is left unaltered. BUGS

The -v switch is a crock.

Performance is slow and uses system resources prodigiously.

The stemming rules' coverage is uneven; new installations will probably want to monitor the output for several months to gather local additions. There is definitely room for improvement in both the DTD and the examples. I hope this has been useful. -- Erik Naggum Professional Programmer +47-2-836-863 Naggum Software Electronic Text 0118 OSLO, NORWAY Computer Communications