Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!sdd.hp.com!spool2.mu.edu!uunet!world!iecc!compilers-sender
From: megatest!djones@decwrl.dec.com (Dave Jones)
Newsgroups: comp.compilers
Subject: Re: Set of allowable next tokens
Keywords: yacc, debug, LALR, parse
Message-ID: <14868@goofy.megatest.UUCP>
Date: 15 Jan 91 01:39:01 GMT
References: <1991Jan11.203916.20495@murdoch.acc.Virginia.EDU>
Sender: compilers-sender@iecc.cambridge.ma.us
Reply-To: megatest!djones@decwrl.dec.com (Dave Jones)
Organization: Megatest Corporation, San Jose, Ca
Lines: 46
Approved: compilers@iecc.cambridge.ma.us

> Is there any way i can configure yacc or some other tool to tell me,
> at each step in a parse, the set of tokens that may follow the current
> token?  I am currently doing this by hand, and it is, as they say,
> a tedious and error-prone process.

> [This same question came up in November, but no answers. -John]


There were plenty of answers John. Some of them were wrong, but not
all. I do agree that the definitive answer has yet to be written.
Anyway, here's the crux of the problem:

The table-driven parser which a typical implementation of yacc generates
keeps state-information on a stack. The things it puts on the stack are
called "states" in the jargon, although the entire stack really comprises the
parser's state. It is easy to determine which tokens can be shifted when a
particular state is on top of the stack. To do that you can have your parser
look in the tables, just as the regular parser does when it is doing its
thing, or you can build something that reads the y.output file you get using
the -v option. (AWK might come in handy.)  But knowing which tokens can be
shifted when a particular state is on top is not enough, because other states
that might have allowed other tokens to be shifted may have been on top of
the stack since the previous token was shifted. That is possible because yacc
typically uses "default reductions". (You can find the technical info on this
stuff in the _Dragon Book_, new or old version. In my copy of the new Dragon
Book, the applicable part begins on page 244.)

So what you have to do is to figure out what tokens *might* have gotten
shifted in states that have been "default reduced" since the last token was
shifted. Of course, you could just keep a token-set, coded as a bit map. That
would be a straight-forward way to do it, and obviously correct, but would
impose some overhead even when the input was error-free. Several people,
including myself, posted algorithms that they claimed could reconstruct the
sets after the fact, some by analysing the stack by running simulations,
others by keeping some bookkeeping info on what states have been on the stack
since the last shift. The only one I have a copy of is the algorithm I
suggested, and it is no longer as obvious to me as it was then.

More later...
[My brain must have curdled, I forgot about the lively discussion in mid-1989
about finding the follow set in the context of error correction and detection.
There was a lot of discussion including several trenchant messages from Mr.
Jones, and several proposed solutions.  -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.