Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!rutgers!ucla-cs!zen!ucbvax!decvax!ima!johnl
From: johnl@ima.UUCP
Newsgroups: comp.compilers
Subject: Re: recursive-descent error recovery
Message-ID: <662@ima.ISC.COM>
Date: Mon, 17-Aug-87 03:17:24 EDT
Article-I.D.: ima.662
Posted: Mon Aug 17 03:17:24 1987
Date-Received: Tue, 18-Aug-87 05:24:10 EDT
References: <634@ima.ISC.COM> <642@ima.ISC.COM> <651@ima.ISC.COM>, <655@ima.ISC.COM>
Sender: johnl@ima.ISC.COM
Reply-To: decvax!utzoo!henry
Lines: 43
Approved: compilers@ima.UUCP

> This seems like a pain.  Recusrive descent with error recovery performed
> by the higher level entity would seem to be simpler, namely because the
> higher level entity knows more about what's going on...

On the contrary, doing it at the higher level is a horrendous pain, and the
smooth simplicity of the low-level recovery (which has detailed guidance
from the higher level, remember) is an enormous win.  You have to experience
the difference to fully appreciate it -- I've written parsers both ways.

The problems with doing it at the higher level boil down to (a) it adds a
lot of complexity to the code, and (b) error repair often has to cross
the boundaries of syntactic structures, which is painful in recursive
descent because those are function boundaries in the parser.  By contrast,
the low-level approach requires 100 lines or so of code to handle *all*
syntactic error repair for the entire compiler, and it's all in one place
rather than interspersed throughout the parser.

> ... Okay...  Now I've told the user it screwed
> up, let's recover from this sucker.  The simplest thing to do, since
> I'm in an expression, is to toss tokens until I get to a synchronizing
> token like a semi...

Just where does the code that does this reside?  Remember that the code
for parsing an expression is big and complicated, and may be spread over
several functions.  (In a straightforward recursive-descent parser, it
will be a dozen or more functions.)  They all have to cooperate very
carefully to make even such a crude algorithm work.  This takes a lot
more effort and code than you would think.

Also, you've missed an (admittedly non-obvious) point in my contribution.
The error repair need not be at anywhere near as gross a level as throwing
away everything until the next semicolon.  That is necessary as a "backstop"
algorithm, but the more local resync heuristic can be something like "if the
input token is punctuation and the requested one is not, throw away the input,
otherwise keep it".  This repairs many minor goofs promptly and *correctly*.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request