Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!uunet!world!iecc!compilers-sender From: mauney@eos.ncsu.edu (Dr. Jon Mauney) Newsgroups: comp.compilers Subject: Re: Looking for parsing references Keywords: parse, bibliography Message-ID: <91-06-032@comp.compilers> Date: 20 Jun 91 15:04:35 GMT Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: mauney@eos.ncsu.edu (Dr. Jon Mauney) Organization: North Carolina State University Lines: 67 Approved: compilers@iecc.cambridge.ma.us >I'm interested in references/pointers and also your views and opinions >on the following:> > >1. Tools/algorithms for parsing of general context free grammars %A Susan L. Graham %A Michael A. Harrison %A Walter L. Ruzzo %T An improved context-free recognizer %J TOPLAS %V 2 %N 3 %P 415-462 %D July 1980 Describes a general parsing algorithm, including a good discussion of the relationship between their algorithm, Earley's, and the Cocke-Kasami-Younger algorithm. >2. LL-parsing tools/algorithms The standard compiler texts do a pretty good job. For gory detail on LL, including how to test language-equivalence, see Aho-Ullman "The Theory of Parsing, Translation, and More Parsing" :-) volume 2. The LL(1) parser generator with syntax-error handling described in Fischer&LeBlanc "Crafting a Compiler [in C]" is available, as is (I believe) the one described by Holub in his book. >3. Advantages and disadvantages of LL and LR parsing as compared to each > other. The LL(1) parsing algorithm is simpler, and so perhaps easier to understand and implement. LL(1) tables tend to be a little smaller and you should be able to make the parser loop a hair faster than LR and its popular variants. But parsing speed is not usually an important factor, and these days the table size should be irrelevant. The form of LL(1) grammars is more restrictive: no left-recursion, no common prefixes. So it is easier to bash a grammar into LALR form than LL. But once you know what LL(1) requires, it really is not difficult to make most grammars LL(1). Mostly it is a matter of putting things into an idiomatic form; and people who program in C should have no problems with strange-looking idioms. Once a grammar is LL, calls to semantic actions can be placed anywhere. In LR, actions are normally associated with reductions only. LR parsers needs must restrict the placement of semantic actions, since in some cases it is not clear which production is being matched, and therefore which action should be done. These are exactly the places where LL(1) requires you to factor out the common prefixes. Recovery from syntax errors is easier with LL(1), because the information on the stack is straightforward: a list of the grammar symbols expected to match the rest of the input. with LR methods, the stack cleverly encodes information in states, and you must cleverly decode the states in order to figure how to repair the error. A question that periodically arises on the net is "how do I produce a list of tokens that could be accepted from a given parse stack in YACC?" The answer is "very painfully", but for an LL(1) parser it should be simple. -- Jon Mauney, parsing partisan Computer Science Dept. N.C. State University -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.