Path: utzoo!utgpu!watmath!watdragon!watsol!tbray From: tbray@watsol.waterloo.edu (Tim Bray) Newsgroups: comp.text Subject: What SGML is and isn't, again Message-ID: <15297@watdragon.waterloo.edu> Date: 21 Jul 89 02:04:56 GMT Sender: daemon@watdragon.waterloo.edu Reply-To: tbray@watsol.waterloo.edu (Tim Bray) Distribution: world Organization: New OED Project, U. of Waterloo, Ontario Lines: 74 I posted recently on text modelling and SGML; among other things I said: But the SGML standard itself is horribly flawed and permits some things which are unhelpful and even dangerous. The details are too lengthy and sordid to go into here, but I can talk in detail on request. Well, lots of people requested. Herewith some specific gripes. But first: I'm not an SGML expert. SGML requires that you write a prescriptive grammar for your text before you can begin to use it, which you can't for the OED, or for a large class of similar existing reference documents (e.g. big dictionaries, legislation & commentary). So we gave up on it some time ago. We do read , and I've struggled the best part of the way through the standard (arrgh), and try to keep in touch, but... Bear in mind that all the specific gripes listed below are just background in the context of this one big problem that keeps it from being useful for an important class of interesting reference texts. There are a lot of people working with (and some betting their jobs on) SGML. If I'm going to sit here and point out what look like problems, somebody really should get in and present the other side. I don't think the SGML folk are fools, or even wrong, just that there is this big class of problems for which the system breaks down. The SGML spec is ugly, ugly, ugly. I can read telephone system technical specs and I have BIG comprehension trouble with the standard. The SGML meta-syntax is ugly, ugly, ugly. I have heard it likened to OS JCL. Such ugliness may not be fatal in and of itself, but is often symptomatic of design flaws at the core. Parsability. The SGML syntax is such that it can't be parsed by a Context Free Grammar. This is just stupid, since it could have been specified right without loss of expressive power. How long has SGML been around, and how many real industrial-strength parsers are there? I've heard of two. Never looked inside one; don't think I want to. Tag minimization. The SGML standard allows all sorts of tags to be omitted, shortened, or expressed along the lines of when it's "obvious" what's going on. I understand that specifying when it's obvious has turned out to be an intractable problem and the spec may end up saying "and when such minimization doesn't make the grammar ambiguous." Sounds like an admission of defeat to me. The idea, it seems, was to make it easier to type in all those tedious "<,/,>" sequences. Well, if your editor program isn't smart enough to figure out how to complete a partly-entered tag, or figure out which is the appropriate end tag using a single keystroke, you're in deep trouble anyhow. This is the same kind of thinking that gave us keyword abbreviation in PL/I. Wouldn't be a problem is that people actually *use* this misfeature. Tag attributes. This is a very controversial area in the SGML field. Nobody can agree whether tag attributes should or should not be used (people seem to be drifting towards "as little as possible") and if so, what should be expressed as tag and what as attribute. I note that all the SGML editors I've seen handle attributes rather awkwardly. Intellectual impoverishment. (Special case of the One Big Gripe above). Once people have invested all the time in writing a DTD and bludgeoning a document into line, they start believing in Complete Descriptive Markup and the Tooth Fairy. That is, they feel they have captured all the important structures, and if I, with my flexible computing tools, want to start improving or further elucidating the structure, I'm out of luck (and they don't want to go near that grammar again if they don't have to). This regardless of the fact that human language has near-infinite flexibility and resists formal specification, as Chomsky et al found out in the 60's. Enuffa that. Maybe SGML is the right structure for an important subclass of the text universe; in particular, for interchange of relatively simple documents. But if it can't handle the big reference documents, then for a lot of applications it's a toy. Tim Bray, New OED Project, U of Waterloo, Ont., Canada