Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!rutgers!ucla-cs!zen!ucbvax!decvax!ima!johnl From: johnl@ima.UUCP Newsgroups: comp.compilers Subject: recursive-descent error recovery Message-ID: <665@ima.ISC.COM> Date: Mon, 17-Aug-87 11:45:35 EDT Article-I.D.: ima.665 Posted: Mon Aug 17 11:45:35 1987 Date-Received: Wed, 19-Aug-87 01:26:45 EDT Sender: johnl@ima.ISC.COM Reply-To: cullvax!drw@EDDIE.MIT.EDU (Dale Worley) Organization: Cullinet Software, Westwood, MA, USA Lines: 86 Approved: compilers@ima.UUCP OK, he wants flames... chuck@amdahl.amdahl.com (Charles Simmons) writes: > A while ago I suggested that one cannot write a reasonable > compiler using YACC. ... I claim it is impossible to write a reasonable > compiler using YACC. ... ... I further claim it is impossible to produce > reasonable error messages using YACC. I've been working on a production compiler that uses Yacc for parsing, and I've given a bit of thought to this. I claim that it is indeed possible to produce good syntax error messages. The strategy is to print out several things: 1. a message that a syntax error has been found 2. a pointer to the token which caused the parser to barf (This is extremely important -- in practice this pointer is almost all a programmer needs. Compare this to the usual style: "line 37: invalid declaraction" when line 37 contains a nasty mess of structs and unions.) 3. a description of the token that was found (In case it tokenizes differently than the programmer expected.) 4. a listing of the tokens that are valid in this position (This gets long in some cases (expression operand expected, operator expected, and statement verb expected), so some special case code is needed to condense these forms.) This functions as a quick refresher of the grammar for the programmer. This isn't so important for languages like C with a small grammar, but its very useful for languages like ours that have over 50 different statements. (uck!) 5. a summary of where the parse stands now: statement-list ; l-value = expression + ( All of this stuff is immediately accessible from Yacc's tables. I'll admit that all of this (except 5) can be done in recursive descent, but usually people who code recursive descent don't bother. For example, in one popular C compiler, the construction struct { int a, reel /* <- note misspelling */ b } ; produces the message "closing brace expected", because when the parser sees 'reel' (which the lexer returns as 'identifier' rather than 'type name'), it says "Aha! He forgot to close off his structure declaraction." I'd much rather the compiler just tell me that I made an error, and not try to guess what I meant. As far as size and speed, I doubt that a recursive-descent parser for a complex grammar (say, 750 Yacc states) would be anywhere near as small, because the number of read-and-test code segments in an RD parser has to be about as large as the number of Yacc states. There are about 4 int's in Yacc's tables for each state, which is a lot more compact than the code you're going to generate to read-token-and-test. As far as needing an external stack for looping constructs, give me a break! Yacc maintains a stack, use it! /* generating the code for the expressions in the order they are * written requires more goto's than necessary, but otherwise I have * to write code to read and store them and then emit them later */ for_statement : FOR '(' expression ';' $remember_loc expression $goto_on_false $goto ';' $remember_loc expression $goto ')' $remember_loc statement $goto { /* if test is false, exit for */ backpatch($7, current_object_location); /* if test is true, go to body */ backpatch($8, $14); /* after step expression, go to test */ backpatch($12, $5); /* after body, go to step expression */ backpatch($16, $10); } ; $remember_loc : { $$ = current_object_location } ; $goto : { emit a goto statement, $$ = location where it's destination location will be filled in later by backpatch() }; $goto_on_false : { emit a conditional goto, $$ = destination location } ; -- Dale Worley Cullinet Software ARPA: cullvax!drw@eddie.mit.edu UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request