Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!ima!johnl From: johnl@ima.UUCP Newsgroups: comp.compilers Subject: Re: Yacc poll Message-ID: <654@ima.ISC.COM> Date: Fri, 14-Aug-87 13:38:41 EDT Article-I.D.: ima.654 Posted: Fri Aug 14 13:38:41 1987 Date-Received: Sun, 16-Aug-87 09:44:19 EDT References: <642@ima.ISC.COM> <378@hubcap.UUCP> Sender: johnl@ima.ISC.COM Reply-To: ihnp4!m10ux!mnc (52171-Michael Condict--MHx7002 (mh****)m000) Lines: 148 Approved: compilers@ima.UUCP Summary: Writing recursive descent parsers In article <378@hubcap.UUCP>, steve@hubcap.clemson.edu ("Steve" Stevenson) writes: > Hand-writing a recursive descent parser makes as much sense as hand-writing > a LALR(1) parser. Why don't you use yacc-lex to write a simple input > mimic of yacc, then use a reasonable plagiarism (like from Backhouse's > book.) > Actually, it makes a lot more sense. The natural control structure for implementing a recursive-descent parser is a set of mutually recursive procedures, one per syntactic construct. This is a much simpler structure, bearing a much more obvious resemblance to the grammar, than the structure that results from hand-coding a bottom-up parser. In fact, writing the control structure of a recursive-descent parser from the grammar can be done as fast as you can type it in, and will result in a program that is no larger than a yacc grammar. The only difficult part (and it is not difficult for a well-designed language) is to figure out the set of tokens that can begin each type of construct, so that you can decide which of several recursive procedures to call (or whether to return). I've been able to do this correctly in my head as I typed in the parser for several programming languages, including C and Pascal. The advantages of a hand-written parser include: (1) Speed. By a few carefully chosen modifications of the program produced by a literal transcription of the grammar, an experienced programmer can easily obtain a parser that is substantially faster than yacc. If pressed, I have examples supporting this claim. The obvious optimizations are to collapse the code of separate, non-recursive constructs and to use some sort of precedence-based technique for handling the expression portion of the language. Of course it is also easy to obtain recursive-descent parsers that are slower than yacc, but at least you can control the efficiency. (2) Flexibility in error-reporting and recovery. You control the code -- you can do anything you want. This should be obvious. And as someone pointed out, you can use abstractions like: EXPECT(token, msg, skip_set) to mean "Expect to find the specified token as the current token. If so, read past it; otherwise print the specified error message and read tokens until you find the expected one or one in the skip_set. If the expected one was found, read past it." This produces error recovery that is at least the same quality as in yacc and at least as obvious from looking at the code. (3) Ease of data allocation in the semantic code. You automatically get a fresh stack frame of variables for each nested occurrence of a particular construct. Does anyone know how to implement nested constructs such as FOR loops in a yacc parser without having to explicitly implement a stack to keep track of information about each currently active for loop? This is a real pain in the butt! (4) Flexibility in use of the parser. Unlike a yacc parser, your parser can (by definition) be called recursively. For instance, suppose you decide that in order to process the semantics of the language construct you are looking at right now, you need to first process some declaration it depends on, which is in a different file. What you want to do is suspend the current parse, reinvoke the parser on the other file, then come back to what you were doing. This is trivial in a hand-written parser, and infeasible (as far as I know) in a yacc parser. The only disadvantage of a hand-written parser that I am aware of is that, for an experimental language whose syntax is changing rapidly, the effort spent modifying the parser may be considerable. This is not a serious impediment for a programmer who is experienced with recursive-descent parsers, but is a consideration if the parser is to be maintained by programmers who have always done their parsers with canned generators like yacc. To summarize with a controversial claim: parser generators are a classic example of a solution looking for a problem. They exist primarily because they are fun to write and because automated parsing techniques are fun for computer scientists to study (or at least easy to study). After using yacc in three different commercial-quality projects, I do not find that is has saved me any significant amount of labor, certainly not enough to justify the restrictions it has put upon my parsers. Parsing is the most straightforward part of language processing, the part that is easiest to do well without any table-driven or program-generator techniques. It is the semantic processing that is least-well understood and which is usually implemented quite poorly. Michael Condict {ihnp4|vax135|cuae2}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ ------------------ P.S. To support my claim about the resemblance of a recursive-descent parser to the grammar, here is a short example. Note that the parser does more than the grammar, namely error reporting and recovery, using the above- described EXPECT macro: Grammar R.D. Parser ------ ----------- TOK_TYPE tok, gettok(); prog: stmt void prog() { tok=gettok(); | prog stmt while (tok != EOF_TOK) stmt(); ; } stmt: void stmt() { switch (tok) { IF case IF: tok=gettok(); expr expr(); THEN EXPECT(THEN, "THEN expected", END); if (tok == END) break; stmt stmt(); EXPECT(END, "END IF expected", END); END IF EXPECT(IF, "END IF expected", END); break; | case ID: tok=gettok(); ID ':=' EXPECT(ASSN, "':=' expected", ';'); expr ';' expr(); EXPECT(';', "';' expected", ';'); break; default: error("Bad token beginning stmt"); tok=gettok(); } ; } expr: void expr() { primary primary(); while (1) { if (tok == '+') { | expr '+' primary tok=gettok(); primary(); } else if (tok == '-'); { | expr '-' primary tok=gettok(); primary(); } else break; } ; } primary: ID void primary() { EXPECT(ID, "ID expected in expr", ';'); ; } [I've written my share of commercial compilers, and found that yacc saved me an enormous amount of time. To each his own. It occurs to me that it shouldn't be all that hard to change the yacc parser so that it implements the kind of error recovery that people have been suggesting for RD, to fake up a stream of tokens that lets the parse continue. Has anyone done that, or are there pitfalls I hadn't noticed? -John] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request