Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!yale!mintaka!spdcc!esegue!compilers-sender From: bliss@sp64.csrd.uiuc.edu (Brian Bliss) Newsgroups: comp.compilers Subject: Re: Can Pascal be parsed by LR(1) parsing algorithm? Keywords: pascal, parse Message-ID: <1990Oct10.223215.19174@csrd.uiuc.edu> Date: 10 Oct 90 22:32:15 GMT References: <9010091533.AA02386@apple.com> <9010101445.AA06181@dynamo.ecn.purdue.edu> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: bliss@sp64.csrd.uiuc.edu (Brian Bliss) Organization: Center for Supercomputing Research and Development Lines: 96 Approved: compilers@esegue.segue.boston.ma.us > I'm not so sure that LALR(1) is equivalent to LALR(k) Any deterministic context free language can be recognized with a DPDA, and for any DPDA, we can construct a LR(1) grammar. so the class of languages recognizable with LR(k) grammars is the same as the class of languages recognizable with LR(1) grammars. There are grammars (not languages!), however that can be parsed with k tokens of lookahead and not with just 1. The following grammar is LR(2) but not LR(1): S-> A T A-> a | a b T-> b | c Consider the string "abc". after the "a" is absorbed, we must decide whether to reduce by "A-> a" or "A-> a b", with only the "b" as the lookahead token. at this point, we have the conflict: "abc" a shift a b reduce A s A c r A T r S vs. "ab" a r A s A b r A T r S The grammar is LR(2) (and LALR(2)) Languages which are recognized with LR(1) grammars can also be recognized with LALR(1) grammars - from the LR(1) grammar, construct the canonical set of items (states) needed for the DPDA if we merge items with the same set of productions (but possibly different lookaheads) and get no conflicts, we have a LALR(1) parser. If merging items results in a conflict, we must have two items: @ L-> nonterm_string, a and L-> nonterm_string, b where merging to L-> nonterm_string, a b produces a conflict so introduce 2 new nonterminals L' to your grammar. - for every production A-> ... L ... add L'-> nonterm_string if a and not b could legally follow L in a sentential form. and add A-> ... L' ... delete A-> ... L ... add L''-> nonterm_string if b and not a could legally follow L in a sentential form. and add A-> ... L'' ... delete A-> ... L ... otherwise, leave the production unchanged - only one of A-> ... L ... A-> ... L' ... A-> ... L'' ... is in the new grammar - we haven't made it ambiguous. @ If there exists more than two items producing the conflict, either 3 or more being merged to the same LALR item, or 2 or more different sets being merged to diffent LALR items, repeat this process, merging only two conflicting items each time. So if L (g) means the class languages recognized by a set of grammars n, and G(n) means the set of grammars parsed in some manner n L(G(LALR(1))) = L(G(LR(1))) = L(G(LALR(k))) = L(G(LR(k))) but G(LALR(1)) != G(LR(1)) != G(LALR(k)) != G(LR(k)) bb -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.