Path: utzoo!utgpu!water!watmath!watdragon!watsol!tbray From: tbray@watsol.waterloo.edu (Tim Bray) Newsgroups: comp.text Subject: SGML defended (Long) Message-ID: <7986@watdragon.waterloo.edu> Date: 25 Jul 88 17:50:53 GMT References: <61024@sun.uucp> Sender: daemon@watdragon.waterloo.edu Reply-To: tbray@watsol.waterloo.edu (Tim Bray) Organization: New Oxford English Dictionary Project, U. of Waterloo, Ontario Lines: 111 In article <61024@sun.uucp> tut%cairo@Sun.COM (Bill "Bill" Tuthill) writes: > I'm moving a discussion of SGML started in comp.text.desktop into this > newsgroup, because I think the issues are larger than a desktop. and so on. I have to disagree with nearly every line of Bill Tuthill's contribution. There are real problems with SGML, but they are not the ones he identifies. I think the problem is that he considers SGML strictly as a typesetting system, which is really beside the point. Detailed discussion follows, but the important points are: 1. If any on-line use for a document other than printing it out (Hypertext, information retrieval, on-line documentation) is contemplated, structural rather than typographical markup is a necessity. The arguments for this are many and are overpowering in their force. Rather than run through them, I refer everyone to the excellent article `Markup Systems and the Future of Scholarly Text Processing', by Coombs, Renear, and DeRose in the Nov. '87 CACM. 2. The SGML standard is a crock. I have not read it, but this is the unanimous consensus of everyone I know who has tried to work with it. The basic SGML syntax and concepts, however, are sound. I think the logical conclusion should be: let's not let the failure of the standards drafters deter us from using this basically good idea. Now, to address Mr. Tuthill's points: >Instead, SGML should be compared to decent >procedural languages such as troff and TeX. There are good reasons why >troff and TeX macro packages were invented: well-designed macros provide >writers with a descriptive layer ... No, SGML shouldn't be compared to these things. SGML and the typesetting packages exist to solve different problems. When you want to typeset your SGML document, you should translate it into troff or TeX or PostScript or something that's good at that job. SGML exists to prevent typographical nits from getting in the way of structural document design decisions. See the CACM article. >SGML is no panacea for portability. Being a metalanguage, SGML does not >provide one syntax, but only a method for describing different syntaxes. >On p. 68 Goldfarb states, "SGML allows variant concrete syntaxes." This >is tantamount to saying it isn't really standard. It would probably be >as difficult to translate between variant syntaxes as to translate between >troff and Interleaf or Frame. The great virtue of SGML is that it is very easy for computers to parse and is probably the most flexible form in which it is possible to store text. Our practical experience on the New OED project is that the first thing to do with input text is to do away with all the typesetting gibberish and get some approximation of SGML tags in there. You don't have to worry too much about getting them right; once the basic structure is there, it's remarkably easy to transform the text into the right setup, once you figure out what that should be. >SGML was born obsolete. Graphics are missing from the specification, as >are provisions for tables and equations. It is certainly possible in SGML to make a reference to an externally-stored graphic. Then at typesetting time, you copy in the appropriate PostScript/pic/rasterfile or whatever. SGML does indeed allow the specification of tables and equations, in a typography-independent way that lends itself to a variety of information-retrieval applications. Try to make automatic sense out of tbl or eqn source! On the other hand, it's easy to translate SGML structures *into* tbl or eqn or whatever. >SGML: > This added information, called markup, serves two purposes: >
    >
  1. Separating the logical elements of the document; and >
  2. Specifying the processing functions to be performed on those elements. >
> This figure represents divine document intervention. -------- >troff > This added information, called \*Qmarkup\*U, serves two purposes: > .NP > Separating the logical elements of the document; and > .NP > Specifying the processing functions to be performed on those elements. > .LP > This figure represents divine document intervention. Which of these, do you think, lends itself better to online IR applications? Which is more easily automatically translated to the other? Both answers are obvious. >In the concrete syntax described, the >ASCII characters < > & % ; appear to be reserved symbols, but Goldfarb >offers no method for printing these characters literally. '<': <. '>': >. '&': &. etc... >SGML documents are supposed to be rigorous, but >rigorous means inflexible. A good point, and one of the big problems with the SGML standard. ISO SGML requires that one prepare what amounts to a *prescriptive* grammar for your document. This may be appropriate for airplane checkout manuals (maybe), but most document creators, when you get right down to it, know what they're doing pretty well and don't need a grammar getting in their way. Also there is the (common) problem of wanting to markup an existing body of text (for example the Oxford English Dictionary) which just ain't gonna always follow the rules. Does this mean one gives up the descriptive power of structural markup? Hey, I like troff/TeX and so on for doing typesetting. But typesetting is just one of many things that can be done with an electronic document. If you want enough flexibility to do some of those other things, don't limit yourself to typographical markup. Cheers, Tim Bray, New Oxford English Dictionary Project