Path: utzoo!attcan!uunet!jarthur!usc!apple!olivea!mintaka!spdcc!esegue!compilers-sender From: jas@Ingres.COM (Jim Shankland) Newsgroups: comp.compilers Subject: Re: Can Pascal be parsed by LR(1) parsing algorithm? Keywords: Pascal, LL(1) Message-ID: <9010282146.AA04999@ingres.ingres.com> Date: 28 Oct 90 21:46:06 GMT References: <9112@fy.sei.cmu.edu) <9010232339.AA20860@Alliant.COM> <1990Oct26.220803.1411@Neon.Stanford.EDU> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: jas@Ingres.COM (Jim Shankland) Organization: The Eddie Group Lines: 75 Approved: compilers@esegue.segue.boston.ma.us In article <1990Oct26.220803.1411@Neon.Stanford.EDU> andy@Theory.Stanford.EDU (Andy Freeman) writes: >In article <9010232339.AA20860@Alliant.COM> crocker@Alliant.COM (Ben Crocker) writes: >>Having written a Pascal compiler with an LL(1) parser generator, I can >>vouch for the proposition that Pascal is LL(1). > >Such compilers are built on tokenizers with 2 character look-ahead. Remember >that "1..5" has the same tokens as "1 .. 5", but requires 2 character >look-ahead to distinguish from streams containing "1.". > >Look-ahead 2 tokenising feeding a Lx(1) parser does not demonstrate that >Pascal is Lx(1); it demonstrates that a tokenized version of a language may >have different look-ahead requirements than the stream-of-characters version. Such postings should best be made from a machine with a name other than "theory.stanford.edu" :->. For any language with an LR(n) grammar, n > 0, there is an LR(n -1) grammar accepting the same language. In other words, a language is either LR or not LR; it makes no sense to say that a language is LR(k), for any particular k. There are practical considerations, in that the LR(n) grammar for a language may be considerably smaller and more comprehensible than the LR(n - 1) grammar. But that's all. The original proof of this is due to Knuth; it can no doubt be found in many sources, certainly in Hopcroft and Ullman. (I'm away from my references, so I can't be more precise; sorry.) As for LL languages, in <1990Oct26.213044.28243@Neon.Stanford.EDU>, Sriram Sankar (sankar@neon.stanford.edu) asserts: >LL(n+1) languages are a strict superset of LL(n) languages; As I said, I'm away from my references, and I'm not really a theoretician, but I believe this, too, to be incorrect. I think that if there is an LL(n + 1) grammar for a language, then there will also be some LL(n) grammar for the language (n >= 0). Consider the following proof sketch showing that an LL(2) grammar can be rewritten into an LL(1) grammar (the generalization should be straightforward): If a grammar G is LL(2), but not LL(1), then there must be productions of the form A --> a alpha B --> a beta such that both productions could occur from the same parser state. (A and B might be the same non-terminal.) Intuitively, the parser cannot "know" which production is the correct one with only the "a" as lookahead. Now rewrite the grammar as follows: remove those two productions and replace them with A' --> alpha B' --> beta where A' and B' are new non-terminals not used elsewhere; also, for any production P having an A or a B in its right-hand side, add a new production P' that is just like P, except that every occurrence of A on the rhs has been replaced by "a A'"; similarly with B, substituting "a B'". Intuitively, this defers the decision of which parse is correct by pushing the a and encoding the fact that an a was read in the parser state. Finally, as to whether Pascal is or is not LL(anything): I will ship a case of Anchor Steam beer to anyone who can show that Pascal is even context-free by writing a context-free grammar that will accept only correct Pascal programs and reject all other strings, including strings that would be correct Pascal programs except that an undeclared variable is used in the program. No fair cheating by using a Turing machine (e.g., C code) to maintain a symbol table. jas -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.