Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!world!iecc!compilers-sender
From: S.R.Adams@ecs.southampton.ac.uk (Stephen Adams)
Newsgroups: comp.compilers
Subject: Generating LR parsers from EBNF?
Keywords: LR(0), parse, question
Message-ID: <29305.9105072033@pessoa.ecs.soton.ac.uk>
Date: 7 May 91 20:33:42 GMT
Sender: compilers-sender@iecc.cambridge.ma.us
Reply-To: Stephen Adams <S.R.Adams@ecs.southampton.ac.uk>
Organization: Southampton University Computer Science
Lines: 57
Approved: compilers@iecc.cambridge.ma.us

A recent posting (sorry, I can't remember exactly which) mentioned that
someone's parser generator used an extended BNF notation directly for
generating the LR automaton.  I had always been lead to believe that an
EBNF grammar is first translated into a context free grammar (CFG) and
then that is used for parse table generation.

I thought about the idea and tried a few tiny examples by hand.  The main
trick is to reason about the placing of the `dot' in the EBNF items.
Shifting the dot in an item may produce more than one derived item.  For
example, consider the simple grammar for nested parenthesized lists of
numbers:

	s -> ( {s} )
	s -> number

`{s}' menas 0 or more s's.  This grammar might generate `5' or 
`(1 2 (3 4) 5)'.

The item `s -> . ( {s} )' shifted to two items, `s -> ( { . s} )' and
`s -> ( {s} . )'  The state containing these two items would usually be two
states in a CFG generated automaton.

The parse tables came out smaller.  The EBNF generated table was smaller
because some table slots in the CFG generated table were being replaced
with sequences of operations, for example a reduction of a null production
might be replaced by a compound operation `reduce and goto'.  After these
changes some states and some columns of the goto table are never used and
may be removed.

I have looked in several books and a couple of conference proceedings but
I have failed to find any references on generating LR parse tables
directly from an EBNF grammar.  What I would like to know is:

  1.  Are there any references?

  2.  Is the EBNF approach equivalent to the CFG one?  Is
      the difference in the tables always a due to small set
      of `optimizations'?

  3.  Which is faster?  The EBNF item sets are smaller but
      more complex to handle.  Using the EBNF approach seems
      to reduce the need for `optimizing' the generated
      table.

  4.  I only generated the LR(0) automaton by hand.  I have
      not thought about `higher' grammar categories like
      LALR(1), LR(2), regular-LR etc.  Are these kinds
      easily generated using the EBNF method?


Stephen Adams                        S.R.Adams@ecs.soton.ac.uk
Computer Science                     S.R.Adams@sot-ecs.uucp
Southampton University
Southampton SO9 5NH, UK
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.