Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!sun-barr!decwrl!decvax!ima!esegue!compilers-sender From: worley@compass.com (Dale Worley) Newsgroups: comp.compilers Subject: Error reporting in LR parsers Message-ID: <1989Aug4.032053.753@esegue.uucp> Date: 4 Aug 89 03:20:53 GMT Sender: compilers-sender@esegue.uucp Reply-To: worley@compass.com (Dale Worley) Organization: Compilers Central Lines: 52 Approved: compilers@esegue.segue.bos.ma.us The idea of adding a 'yyexpected' function might sound good at first, but there are a number of problems with it. As was pointed out before, the number of expected terminals will probably be quite large, as this will include terminals on which a shift is possible and terminals on which a reduction is possble. For a grammar for Pascal, for example, this could be 30 or more terminals in some spots. Also, you would probably have to modify the Yacc source code to do this because Yacc inserts default reduction actions. These default reduction actions eliminate the lookahead information associated with reductions. I've used similar techniques, and a few simple fixes can solve a lot of these problems. The first is to define "token groups", which are groups of tokens the user thinks of as being similar. For instance, '+', '-', '*', etc. can be included in the group 'operator'; '(', 'IDENTIFIER', '-', etc., can be included in the group 'expression'. Now, when a syntax error is found, the error-reporter assembles a list of acceptable tokens. Then it checks each token group, and if the group is contained in the list of acceptable tokens, all those tokens are removed from the list, and the name of the token group is added. (In practice, only about three members of the group need to be checked for membership, and if one token group overlaps another, you have to be careful about the order in which the groups are tested.) The result is that you can usually reduce the list of allowed tokens to about three items, and produce error messages like: 34 i = j k; ^ Syntax error. Found: IDENTIFIER 'k' Expecting: OPERATOR ; END The trick of only checking a few members of the group can be used to defeat the problem of losing information because of default reductions. The way to do this is to only test members whose validity will not be eliminated by the default reductions. In the above example, 'j' may be reduced to 'expression' before the error is found, but if the 'operator' group is represented only by '+', then the fact that '*' cannot follow 'expression' will not stop the 'operator' group from showing in the list of expected tokens. Dale -- Dale Worley, Compass, Inc. worley@compass.com No one hates so bitterly as the lazy and poor contemplating the energetic and prosperous. -- -- Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU { decvax | harvard | yale | bbn }!ima. Meta-mail to ima!compilers-request. Please send responses to the author of the message, not the poster.