Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!hsdndev!spdcc!iecc!compilers-sender From: jar@florida.HQ.Ileaf.COM id AA07671; Sun, 21 Apr 91 00:33:37 -0400 (Jim Roskind x5570) Newsgroups: comp.compilers Subject: Technical question about yacc and LALR parsers Keywords: yacc, parse, question, errors Message-ID: <9104210204.AA07587@florida.HQ.Ileaf.COM> Date: 21 Apr 91 02:04:33 GMT Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: jar@florida.HQ.Ileaf.COM (Jim Roskind x5570) Organization: Compilers Central Lines: 55 Approved: compilers@iecc.cambridge.ma.us I have been spending this weekend hacking byacc to automate the process that I go through to evaluate conflicts. One part of this is walking backwards through a parser table. When I generated backwards pointers for each state in the parse, I noticed something rather surprising (at least to me). Basically, for each state k in the parser, there are many states that can transition to k (no surprise), but all such states transition with for the same reason (I was surprised) i.e, all shifts/goto's into state k are made on the same terminal/nonterminal. To put it another way, when the parser generator is sitting in any given state k, the token just to the left of the carrot is a function of k only. When I first coded my data structures, I assumed that for each state k, I would need a list of prior states, and I also wanted to save the token name that was used to transition from the "prior state". When all was said and done, I found that it was sufficient to save a "list of prior states" for each state, as well as a single "prior token" for each state. If I haven't still made myself clear, the following is an example of something that will *never* be found in the verbose output from yacc: state 727 parened_type : '(' VOID . ')' parened_type : '(' INT . ')' Basically, if this was ever seen, it would mean that in state 727, it was possible to have two distinct tokens to the immediate left of the carrot. Note that the following is possible (I think): state 747 gathered_type : '[' VOID . ']' gathered_type : '(' VOID . ')' as it is certainly *not* the case that the state defines the entire left context of the carrot. One corollary (via the pigeon hole principle) of this observation is that there must me more states in such a parser than there are terminals plus nonterminals. Assuming that I have explained myself, the question is: Is this property true of all LALR parsers? It was very conceivable that I could hand code a table driven "LR like" parser that would *not* have this property. Needless to say, my fear is that my "simplified data structures" are not sufficient for all grammars that byacc might generate. Jim Roskind- Author of a YACCable C++ grammar Independent Consultant (407)729-4348 or (617)290-4990 x5570 jar@hq.ileaf.com or ...!uunet!leafusa!jar -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.