Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!world!iecc!compilers-sender From: S.R.Adams@ecs.southampton.ac.uk (Stephen Adams) Newsgroups: comp.compilers Subject: Generating LR parsers from EBNF? Keywords: LR(0), parse, question Message-ID: <29305.9105072033@pessoa.ecs.soton.ac.uk> Date: 7 May 91 20:33:42 GMT Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: Stephen Adams Organization: Southampton University Computer Science Lines: 57 Approved: compilers@iecc.cambridge.ma.us A recent posting (sorry, I can't remember exactly which) mentioned that someone's parser generator used an extended BNF notation directly for generating the LR automaton. I had always been lead to believe that an EBNF grammar is first translated into a context free grammar (CFG) and then that is used for parse table generation. I thought about the idea and tried a few tiny examples by hand. The main trick is to reason about the placing of the `dot' in the EBNF items. Shifting the dot in an item may produce more than one derived item. For example, consider the simple grammar for nested parenthesized lists of numbers: s -> ( {s} ) s -> number `{s}' menas 0 or more s's. This grammar might generate `5' or `(1 2 (3 4) 5)'. The item `s -> . ( {s} )' shifted to two items, `s -> ( { . s} )' and `s -> ( {s} . )' The state containing these two items would usually be two states in a CFG generated automaton. The parse tables came out smaller. The EBNF generated table was smaller because some table slots in the CFG generated table were being replaced with sequences of operations, for example a reduction of a null production might be replaced by a compound operation `reduce and goto'. After these changes some states and some columns of the goto table are never used and may be removed. I have looked in several books and a couple of conference proceedings but I have failed to find any references on generating LR parse tables directly from an EBNF grammar. What I would like to know is: 1. Are there any references? 2. Is the EBNF approach equivalent to the CFG one? Is the difference in the tables always a due to small set of `optimizations'? 3. Which is faster? The EBNF item sets are smaller but more complex to handle. Using the EBNF approach seems to reduce the need for `optimizing' the generated table. 4. I only generated the LR(0) automaton by hand. I have not thought about `higher' grammar categories like LALR(1), LR(2), regular-LR etc. Are these kinds easily generated using the EBNF method? Stephen Adams S.R.Adams@ecs.soton.ac.uk Computer Science S.R.Adams@sot-ecs.uucp Southampton University Southampton SO9 5NH, UK -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.