Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!sdrc!thor!scjones From: scjones@thor.UUCP (Larry Jones) Newsgroups: comp.text Subject: Re: SGML question Summary: SGML badly flawed Keywords: SGML, ambiguity Message-ID: <146@thor.UUCP> Date: 30 Aug 90 14:16:55 GMT References: <555@helios.prosys.se> Organization: SDRC, Cincinnati Lines: 53 In article <555@helios.prosys.se>, ath@prosys.se (Anders Thulin) writes: > I've been trying to make sense of the SGML standard. I'm beginning to > think that it shouldn't be done at home, and only attempted by highly > trained professionals :-, It shouldn't be done at all. The SGML standard is, without a doubt, the poorest excuse for a language standard I have ever seen. I can only assume that it was developed by people with no knowledge of formal languages who adopted a formalism to make the resulting document look more technical. Regular expressions are clearly inappropriate for describing the language, a phrase structure grammar would have been much better. In addition, the standard contains a number of errors, inconsistencies, ambiguities, and violations of the formalisms. The SGML standard is in dire need of interpretation and revision -- unfortunately, unlike ANSI standards which clearly specify how to go about submitting comments and requesting interpretations, ISO standards provide no clues at all. Now, having gotten that out of my system, allow me to try to provide some answers to your questions. > This is my problem. Section 9.6.1 seem to be very clear on the extent > on CON mode: it's essentially used `inside' the _content_ production > [27]. How, then, are delimiters *outside* _content_ recognized? That's what it says, but it's not what it means. CON mode is simply the default mode when you're not in any other mode. So you do indeed start off at the very beginning of the entire document in CON mode and return to it whenever a nested mode is ended. > Another problem is the ">" delimiter. According to the table in Figure > 3 (page 31) in the standard, ">" is recognized as either MDC or TAGC > in CTX mode. But I find nothing that says why it should be recognized > as one rather than the other. CXT isn't a real mode, it's just a pseudo-mode. Some delimiters aren't recognized as such unless they are followed by some particular context as sepcified in 9.6.2. This context may include other delimiters so CXT mode is used to indicate that a delimiter needs to be recognize while verifying the context of another delimiter. Thus, there should really be a separate CXT mode for each of the contexts listed in 9.6.2 (e.g. DCL-CXT, GI-CXT, etc.) and the table in Figure 3 should be expanded appropriately (e.g. TAGC should be GI-CXT [but only if "SHORTTAG YES" is specified on the SGML declaration], and MDC should be DCL-CXT and MSE-CXT). ---- Larry Jones UUCP: uunet!sdrc!thor!scjones SDRC scjones@thor.UUCP 2000 Eastman Dr. BIX: ltl Milford, OH 45150-2789 AT&T: (513) 576-2070 Oh, now YOU'RE going to start in on me TOO, huh? -- Calvin