Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!rutgers!ucla-cs!zen!ucbvax!decvax!ima!johnl From: johnl@ima.UUCP Newsgroups: comp.compilers Subject: recursive-descent error recovery Message-ID: <651@ima.ISC.COM> Date: Thu, 13-Aug-87 20:05:29 EDT Article-I.D.: ima.651 Posted: Thu Aug 13 20:05:29 1987 Date-Received: Sun, 16-Aug-87 05:45:28 EDT References: <634@ima.ISC.COM>, <642@ima.ISC.COM> Sender: johnl@ima.ISC.COM Reply-To: decvax!utzoo!henry Lines: 51 Approved: compilers@ima.UUCP > [I'd be interested in hearing what the recursive descent lexer feeds to the > parser when the input is something like a = b + ( ; -John] The basic approach in the lexer is (a) give the customer what he wants, and (b) if what he wants doesn't match what's in the input, apply some simple heuristics to try to resynchronize. Say we've just parsed the "b", so the parser knows we're in an expression that is part of a statement; here is the dialogue, slightly simplified: Parser: "Do you have any of ?" Lexer: "Yes, I have '+'." P (to itself): "Aha, there's more to this expression." P: "Do you have any of ?" L: "Yes, I have '('." P (to itself): "Aha, a parenthesized expression." P: "Give me one of ." L (to itself): " That's not in my input! I'm going to have to fake it. Let's see... what I see is a line terminator, and he's not asking for a line terminator, so I will try to fake the rest of the line to resynchronize. What's the most harmless thing I can give him, out of the list he asked for?..." L: "I have '0'." P: "Do you have any of ?" L (to itself): "To avoid infinite loops, I should try to cut this short." L: "No." P (to itself): "Must be the end of the expression." P: "Give me ')'." L (to itself): "Keep on faking it..." L: "Okay, I have ')'." P: "Do you have any of ?" L (to itself): "Again, try to cut it short." L: "No." P (to itself): "Must be the end of the expression." P: "Give me ';'." L (to itself): " Resynchronized! Now I can stop lying." L: "I have ';'." ... and so forth. The details of resynchronization heuristics can get messy, especially in a language like C where clear-cut line-terminator tokens are hard to find (e.g. ';' also appears inside "for"), but the basic idea is simple and works well. It's not original with me; I think the first place where I saw it was in Dave Barnard's M.Sc. thesis, describing the SP/k parser and scanner. Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request