Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!ima!johnl
From: johnl@ima.UUCP
Newsgroups: comp.compilers
Subject: Re: Yacc poll
Message-ID: <654@ima.ISC.COM>
Date: Fri, 14-Aug-87 13:38:41 EDT
Article-I.D.: ima.654
Posted: Fri Aug 14 13:38:41 1987
Date-Received: Sun, 16-Aug-87 09:44:19 EDT
References: <642@ima.ISC.COM> <378@hubcap.UUCP>
Sender: johnl@ima.ISC.COM
Reply-To: ihnp4!m10ux!mnc (52171-Michael Condict--MHx7002 (mh****)m000)
Lines: 148
Approved: compilers@ima.UUCP
Summary: Writing recursive descent parsers

In article <378@hubcap.UUCP>, steve@hubcap.clemson.edu ("Steve" Stevenson) writes:
> Hand-writing a recursive descent parser makes as much sense as hand-writing
> a LALR(1) parser.  Why don't you use yacc-lex to write a simple input
> mimic of yacc, then use a reasonable plagiarism (like from Backhouse's
> book.)
> 

Actually, it makes a lot more sense.  The natural control structure for
implementing a recursive-descent parser is a set of mutually recursive
procedures, one per syntactic construct.  This is a much simpler structure,
bearing a much more obvious resemblance to the grammar, than the structure
that results from hand-coding a bottom-up parser.  In fact, writing
the control structure of a recursive-descent parser from the grammar can be
done as fast as you can type it in, and will result in a program that is
no larger than a yacc grammar.  The only difficult part (and it is not
difficult for a well-designed language) is to figure out the set of tokens
that can begin each type of construct, so that you can decide which of
several recursive procedures to call (or whether to return).  I've been
able to do this correctly in my head as I typed in the parser for several
programming languages, including C and Pascal.

The advantages of a hand-written parser include:

(1) Speed.  By a few carefully chosen modifications of the program produced by
    a literal transcription of the grammar, an experienced programmer can easily
    obtain a parser that is substantially faster than yacc.  If pressed, I have
    examples supporting this claim.  The obvious optimizations are to collapse
    the code of separate, non-recursive constructs and to use some sort of
    precedence-based technique for handling the expression portion of the
    language.  Of course it is also easy to obtain recursive-descent parsers
    that are slower than yacc, but at least you can control the efficiency.

(2) Flexibility in error-reporting and recovery.  You control the code -- you
    can do anything you want.  This should be obvious.  And as someone pointed
    out, you can use abstractions like: EXPECT(token, msg, skip_set) to mean

	"Expect to find the specified token as the current token.  If so, read
	past it; otherwise print the specified error message and read tokens
	until you find the expected one or one in the skip_set.  If the
	expected one was found, read past it."

    This produces error recovery that is at least the same quality as in yacc
    and at least as obvious from looking at the code.

(3) Ease of data allocation in the semantic code.  You automatically get a
    fresh stack frame of variables for each nested occurrence of a particular
    construct.  Does anyone know how to implement nested constructs such as FOR
    loops in a yacc parser without having to explicitly implement a stack to
    keep track of information about each currently active for loop?  This is a
    real pain in the butt!

(4) Flexibility in use of the parser.  Unlike a yacc parser, your parser can
    (by definition) be called recursively.  For instance, suppose you decide
    that in order to process the semantics of the language construct you are
    looking at right now, you need to first process some declaration it depends
    on, which is in a different file.  What you want to do is suspend the
    current parse, reinvoke the parser on the other file, then come back to
    what you were doing.  This is trivial in a hand-written parser, and
    infeasible (as far as I know) in a yacc parser.

The only disadvantage of a hand-written parser that I am aware of is that, for
an experimental language whose syntax is changing rapidly, the effort spent
modifying the parser may be considerable.  This is not a serious impediment
for a programmer who is experienced with recursive-descent parsers, but is a
consideration if the parser is to be maintained by programmers who have always
done their parsers with canned generators like yacc.

To summarize with a controversial claim: parser generators are a classic
example of a solution looking for a problem.  They exist primarily because
they are fun to write and because automated parsing techniques are fun for
computer scientists to study (or at least easy to study).  After using yacc in
three different commercial-quality projects, I do not find that is has saved
me any significant amount of labor, certainly not enough to justify the
restrictions it has put upon my parsers.  Parsing is the most straightforward
part of language processing, the part that is easiest to do well without any
table-driven or program-generator techniques.  It is the semantic processing
that is least-well understood and which is usually implemented quite poorly.


Michael Condict		{ihnp4|vax135|cuae2}!m10ux!mnc
AT&T Bell Labs		(201)582-5911    MH 3B-416
Murray Hill, NJ

------------------

P.S.

To support my claim about the resemblance of a recursive-descent parser to
the grammar, here is a short example.  Note that the parser does more
than the grammar, namely error reporting and recovery, using the above-
described EXPECT macro:

	Grammar				    R.D. Parser
	------				    -----------
				TOK_TYPE tok, gettok();

prog:	  stmt			void prog() { tok=gettok();
	| prog stmt		    while (tok != EOF_TOK) stmt();
	;			}

stmt:	  			void stmt() {
				    switch (tok) {
	  IF				case IF: tok=gettok();
	      expr			    expr();
	  THEN				    EXPECT(THEN, "THEN expected", END);
	  				    if (tok == END) break;
	      stmt			    stmt();
					    EXPECT(END, "END IF expected", END);
	  END IF			    EXPECT(IF, "END IF expected", END);
					    break;
	|				case ID: tok=gettok();
	  ID ':='			    EXPECT(ASSN, "':=' expected", ';');
		expr ';'		    expr();
					    EXPECT(';', "';' expected", ';');
					    break;
					default:
					    error("Bad token beginning stmt");
					    tok=gettok();
				    }
	;			}

expr:	  			void expr() {
	primary    		    primary();
				    while (1) {
					if (tok == '+') {
	| expr '+' primary		    tok=gettok();  primary();
					} else if (tok == '-'); {
	| expr '-' primary		    tok=gettok();  primary();
					} else
						break;
				    }
	;			}


primary:  ID			void primary() {
				    EXPECT(ID, "ID expected in expr", ';');
	;			}
[I've written my share of commercial compilers, and found that yacc saved me an
enormous amount of time.  To each his own.  It occurs to me that it shouldn't
be all that hard to change the yacc parser so that it implements the kind of
error recovery that people have been suggesting for RD, to fake up a stream
of tokens that lets the parse continue.  Has anyone done that, or are there
pitfalls I hadn't noticed?  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request