Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!sdd.hp.com!spool2.mu.edu!uunet!world!iecc!compilers-sender From: megatest!djones@decwrl.dec.com (Dave Jones) Newsgroups: comp.compilers Subject: Re: Set of allowable next tokens Keywords: yacc, debug, LALR, parse Message-ID: <14868@goofy.megatest.UUCP> Date: 15 Jan 91 01:39:01 GMT References: <1991Jan11.203916.20495@murdoch.acc.Virginia.EDU> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: megatest!djones@decwrl.dec.com (Dave Jones) Organization: Megatest Corporation, San Jose, Ca Lines: 46 Approved: compilers@iecc.cambridge.ma.us > Is there any way i can configure yacc or some other tool to tell me, > at each step in a parse, the set of tokens that may follow the current > token? I am currently doing this by hand, and it is, as they say, > a tedious and error-prone process. > [This same question came up in November, but no answers. -John] There were plenty of answers John. Some of them were wrong, but not all. I do agree that the definitive answer has yet to be written. Anyway, here's the crux of the problem: The table-driven parser which a typical implementation of yacc generates keeps state-information on a stack. The things it puts on the stack are called "states" in the jargon, although the entire stack really comprises the parser's state. It is easy to determine which tokens can be shifted when a particular state is on top of the stack. To do that you can have your parser look in the tables, just as the regular parser does when it is doing its thing, or you can build something that reads the y.output file you get using the -v option. (AWK might come in handy.) But knowing which tokens can be shifted when a particular state is on top is not enough, because other states that might have allowed other tokens to be shifted may have been on top of the stack since the previous token was shifted. That is possible because yacc typically uses "default reductions". (You can find the technical info on this stuff in the _Dragon Book_, new or old version. In my copy of the new Dragon Book, the applicable part begins on page 244.) So what you have to do is to figure out what tokens *might* have gotten shifted in states that have been "default reduced" since the last token was shifted. Of course, you could just keep a token-set, coded as a bit map. That would be a straight-forward way to do it, and obviously correct, but would impose some overhead even when the input was error-free. Several people, including myself, posted algorithms that they claimed could reconstruct the sets after the fact, some by analysing the stack by running simulations, others by keeping some bookkeeping info on what states have been on the stack since the last shift. The only one I have a copy of is the algorithm I suggested, and it is no longer as obvious to me as it was then. More later... [My brain must have curdled, I forgot about the lively discussion in mid-1989 about finding the follow set in the context of error correction and detection. There was a lot of discussion including several trenchant messages from Mr. Jones, and several proposed solutions. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.