Path: utzoo!attcan!uunet!jarthur!usc!apple!olivea!mintaka!spdcc!esegue!compilers-sender
From: jas@Ingres.COM (Jim Shankland)
Newsgroups: comp.compilers
Subject: Re: Can Pascal be parsed by LR(1) parsing algorithm?
Keywords: Pascal, LL(1)
Message-ID: <9010282146.AA04999@ingres.ingres.com>
Date: 28 Oct 90 21:46:06 GMT
References: <9112@fy.sei.cmu.edu) <9010232339.AA20860@Alliant.COM> <1990Oct26.220803.1411@Neon.Stanford.EDU>
Sender: compilers-sender@iecc.cambridge.ma.us
Reply-To: jas@Ingres.COM (Jim Shankland)
Organization: The Eddie Group
Lines: 75
Approved: compilers@esegue.segue.boston.ma.us

In article <1990Oct26.220803.1411@Neon.Stanford.EDU> andy@Theory.Stanford.EDU (Andy Freeman) writes:
>In article <9010232339.AA20860@Alliant.COM> crocker@Alliant.COM (Ben Crocker) writes:
>>Having written a Pascal compiler with an LL(1) parser generator, I can
>>vouch for the proposition that Pascal is LL(1).
>
>Such compilers are built on tokenizers with 2 character look-ahead.  Remember
>that "1..5" has the same tokens as "1 .. 5", but requires 2 character
>look-ahead to distinguish from streams containing "1.<digit>".
>
>Look-ahead 2 tokenising feeding a Lx(1) parser does not demonstrate that
>Pascal is Lx(1); it demonstrates that a tokenized version of a language may
>have different look-ahead requirements than the stream-of-characters version.

Such postings should best be made from a machine with a name other than
"theory.stanford.edu" :->.

For any language with an LR(n) grammar, n > 0, there is an LR(n -1)
grammar accepting the same language.  In other words, a language is
either LR or not LR; it makes no sense to say that a language is LR(k),
for any particular k.  There are practical considerations, in that
the LR(n) grammar for a language may be considerably smaller and more
comprehensible than the LR(n - 1) grammar.  But that's all.  The original
proof of this is due to Knuth; it can no doubt be found in many sources,
certainly in Hopcroft and Ullman.  (I'm away from my references, so I
can't be more precise; sorry.)

As for LL languages, in <1990Oct26.213044.28243@Neon.Stanford.EDU>,
Sriram Sankar (sankar@neon.stanford.edu) asserts:

>LL(n+1) languages are a strict superset of LL(n) languages;

As I said, I'm away from my references, and I'm not really a theoretician,
but I believe this, too, to be incorrect.  I think that if there is
an LL(n + 1) grammar for a language, then there will also be some
LL(n) grammar for the language (n >= 0).  Consider the following proof
sketch showing that an LL(2) grammar can be rewritten into an LL(1)
grammar (the generalization should be straightforward):

If a grammar G is LL(2), but not LL(1), then there must be productions
of the form 

	A --> a alpha
	B --> a beta

such that both productions could occur from the same parser state.  (A and B
might be the same non-terminal.)  Intuitively, the parser cannot "know" which
production is the correct one with only the "a" as lookahead.

Now rewrite the grammar as follows:  remove those two productions and
replace them with

	A' --> alpha
	B' --> beta

where A' and B' are  new non-terminals not used elsewhere; also,
for any production P having an A or a B in its right-hand side,
add a new production P' that is just like P, except that every
occurrence of A on the rhs has been replaced by "a A'"; similarly
with B, substituting "a B'".

Intuitively, this defers the decision of which parse is correct by pushing
the a and encoding the fact that an a was read in the parser state.

Finally, as to whether Pascal is or is not LL(anything): I will ship a case
of Anchor Steam beer to anyone who can show that Pascal is even context-free
by writing a context-free grammar that will accept only correct Pascal
programs and reject all other strings, including strings that would be
correct Pascal programs except that an undeclared variable is used in the
program.  No fair cheating by using a Turing machine (e.g., C code) to
maintain a symbol table.

jas
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.