Path: utzoo!attcan!uunet!cs.utexas.edu!yale!mintaka!spdcc!iecc!compilers-sender From: csr!nigelh@sol.uvic.ca (R. Nigel Horspool) Newsgroups: comp.compilers Subject: Re: Set of allowable next tokens Keywords: yacc, parse, debug Message-ID: Date: 12 Jan 91 23:37:21 GMT References: <1991Jan11.203916.20495@murdoch.acc.Virginia.EDU> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: csr!nigelh@sol.uvic.ca (R. Nigel Horspool) Organization: University of Victoria Lines: 76 Approved: compilers@iecc.cambridge.ma.us jmr1g@ra.cs.Virginia.EDU (Jeremiah M. Ratner) writes: >Is there any way i can configure yacc or some other tool to tell me, >at each step in a parse, the set of tokens that may follow the current >token? I am currently doing this by hand, and it is, as they say, >a tedious and error-prone process. Yacc optimizes its tables to use default reductions, so it is impossible to tell from the state listing in the y.output file which symbols are valid in any state that contains a reduce action. If you are content with an approximate answer, it can be obtained directly from LR parse tables that have not been optimized. E.g., the row of the LR parse table for state n might contain entries like: symbol a: shift to state s1 symbol b: shift to state s2 symbols {c,d,e}: reduce by rule r1 symbols {f,g}: reduce by rule r2 and the set of symbols for which actions are defined (i.e. {a,b,c,d,e,f}) is a good approximation to what symbols are allowed when state n is the top state on the LR state stack. This info is readily available from yacc's y.output file. The set is correct (not an approximation) if a full LR(1) parser generator is used. However, for the SLR(1) and LALR(1) methods (Yacc uses LALR(1)), the set of symbols that trigger reduce actions may be too large. To get an exact answer, it is necessary to take context into account to eliminate some impossible next symbols from the set. If you need to know the exact set of valid next symbols at some point in a parse, you can run an algorithm that simulates the updates that occur to the state stack when each reduce action is performed. An algorithm in pseudo Pascal goes something like this: ValidNextSymbols := Valid( SS, VT ); { where SS is the current state stack and VT is the set of all terminal symbols in the grammar} function Valid( SS, T ) Result := set of symbols, x, such that the top state on stack SS has a shift action for x or an accept action for x; Result := Result * T; for each reduce action in the top state on stack SS do Test := set of symbols that trigger the reduce action; Test := Test * T; if Test != empty then CSS := copy of SS; update the stack CSS as required by the reduce action; { i.e. pop k states where k is the length of the RHS of the rule being reduced by, and push the state reached by the goto action for the LHS symbol of the rule } Result := Result + Valid( CSS, Test ); end if end for; return Result; end function; {Note: `+' represents set union and `*' represents set intersection.} Since the algorithm checks which reduce symbols are valid (i.e. those that can eventually be shifted), it works equally well on parse tables that have been optimized by the use of default reductions. I.e., it could be implemented to work with the tables used by a yacc-generated parser. R. Nigel Horspool -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.