Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!spool.mu.edu!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!ugle.unit.no!nuug!ifi!enag From: enag@ifi.uio.no (Erik Naggum) Newsgroups: comp.text.sgml Subject: Re: Looking for on-line DTD's and/or SGML document files Message-ID: Date: 3 Apr 91 12:19:31 GMT References: <21121@gremlin.nrtc.northrop.com> Sender: enag@ifi.uio.no (Erik Naggum) Followup-To: comp.text.sgml Organization: Naggum Software, Oslo, Norway Lines: 82 In-Reply-To: jpl@bat's message of 2 Apr 91 20:12:28 GMT Jeff, The CALS specifications (and DTDs) are available from some FTP server somewhere. I'll try to find out where it was that I found them. (I subsequently got yelled at for wasting disk space, so I may now only be lucky enough to have them on a tape somewhere.) In the meantime, I'd like to suggest paying careful attention to the parameter separator (ps) and entity references, which in my experience is mishandled by simple-minded parsers, and is somewhat hard to get right unless you grok a few things about how the standard presumes that things work. It does tell you about them, briefly, in F.1.1.1 Entities... For instance, given This declaration is syntactically legal but restricted due to these short paragraphs in 10.1.1 Parameter Separator (page:line numbers refer to The SGML Handbook). 372:15 A required /ps/ that is adjacent to a delimiter or another /ps/ can be omitted if no ambiguity would be created thereby. 372:18 A /ps/ must begin with an /s/ if omitting it would create an ambiguity. Recall the syntax production for the parameter separator: [65] ps = s | Ee | parameter entity reference | comment The parsed result is the same as if the declaration had been: but is really ----^ ----^^^^^ ----^^^ where underlined parts are treated as separators, ^'ed parts are the text of the entity referenced, and _ indicates the Entity end signal. Note that the parameter entity reference is a separator in and by itself, but that an /s/ is required in the above element declaration because of [372:18]. It is not always easy to determine when some- thing would constitute "an ambiguity", since this is primarily in- tended for the human reader, not the computer, which can handle these cases perfectly well. Some implementations get this wrong, since they treat entity refer- ences as textual replacements, not calls to the entity manager which will feed the parser from the entity until it ends. (Such an imple- mentation would get "", and would not be fully able to check for entity ends, unless it had "quirks" in it solely for this purpose, which some smaller parsers actually have.) After studying the spec and Goldfarb's excellent book, I've come to conclude that something needs to be said on the data flow model in an SGML parser. It's not intuitively evident from the spec itself, and has posed some problems. The effect is primarily on the conceptual model, but this will invariably have major effect on the implementa- tion techniques employed, and thus on the result, and introduce subtle bugs which it will be difficult to remove. Well, all of this may not apply to you, but you might still find it useful. I know for sure that I spent a lot of time "getting" the idea why entity references weren't allowed everywhere, just like macro calls in, e.g. C, and that they are actually handled by the parser. In short, entities are not as straightforward as they might seem. (Or is it only me?) -- [Erik Naggum] Naggum Software, Oslo, Norway