Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site burl.UUCP Path: utzoo!watmath!clyde!burl!rcj From: rcj@burl.UUCP (R. Curtis Jackson) Newsgroups: net.bugs.usg Subject: yacc and lex bugs Message-ID: <439@burl.UUCP> Date: Thu, 19-Apr-84 13:15:36 EST Article-I.D.: burl.439 Posted: Thu Apr 19 13:15:36 1984 Date-Received: Fri, 20-Apr-84 01:38:17 EST Organization: AT&T Technologies; Burlington, NC Lines: 94 FIRST OFF -- AN APOLOGY: I have been informed that the Unix Hotline folks processed my MR on yacc(1) promptly, and that after sitting in Murray Hill for a year now it is considered "Under Investigation" and the status is "We'll postpone judgement until a later date". The Hotline people did their job admirably, and I am sorry I blasted them without having the MR checked first. 1) yacc a) Problem (history): In the 'good old days' (V6), yacc would not tell you in its debug output that it had found 'token ADDOP'; it would tell you that it had found 'token 426'; it was up to you to find out (via using the -d option and looking at y.tab.h) what token 426 really was. So it was beneficial to define your own token numbers rather than letting yacc default them; that way they were in your source file for easy access. Even today, if you have one lexical analyzer feeding two or more parsers with the same tokens, you want to make sure that the token numbers are the same in both parsers, so this feature of yacc (being able to define your own token numbers) is still quite valid and useful. b) Problem: yacc uses tables of ints to transition from state to state, and it uses negative numbers based on the negative of the token number and on ( -(the_next_desirable_state) - 1000 ). In other words, if you are to transition to state 53, the number in the table will be -1053. [ I am about 90% sure this is accurate -- regardless I do know the problem is related to this ]. If you use token numbers > 1000, then yacc will run perfectly, generate proper y.output if you use the -v option, but when y.tab.c is compiled and executed, the results are totally unpredictable. yacc will transition to wildly inappropriate states and start generating 'Syntax error's at a phenomenal rate. c) Cure: Let yacc default its token numbers unless you absolutely cannot get around it. If you really need that feature, don't use token numbers over 1000. NOTE: remember to start your token numbers above the ascii code, or yacc will think that your ADDOP, to which you have assigned a token number of 040, is a space, and vice-versa. If you have to use token numbers *AND* you have so many tokens that you are running over 1000, then wade through the yacc code and find the define for that number and increase it. (An extremely improbable situation) 2) lex a) Problem: lex has an input character buffer called yysbuf that is dimensioned to YYLMAX, defined to be 200. Unfortunately, the routine that reads the input file [ yylook() ] does not, as far as I can tell, check to make sure that it has not gathered into yysbuf (or yytext, which is also dimensioned to YYLMAX) more than YYLMAX characters. If it is matching a pattern that is more than YYLMAX characters, it writes them right past the end of yysbuf and on into 'The Memory Zone', usually producing Memory Faults or Bus Errors somewhere down the line. b) Cure: If you get a Memory Fault or Bus Error, and cannot seem to locate it, put the following lines into the declarations section of your lex program: %{ blah; blah; blah; # undef YYLMAX # define YYLMAX 5000 /* or some other ridiculously large number */ blah; blah; %} This will override lex's YYLMAX define (see the lex(1) documentation concerning overriding lex's input() macro and also look at the first 15 lines of any lex.yy.c for details). If your Memory Fault/Bus Error goes away, then either: 1) Your pattern specs for lex are out of line -- you are not matching what you think you are matching -- check for rules containing things like [^x], where x is some character. Remember that rules like these match ANY character but x, including newlines. 2) Your pattern specs are OK, but you are simply trying to match more than 200 characters. Use the above method to define YYLMAX to a reasonable number for your application and go on. Hope this helps some people, please direct any questions/comments to me at the address below, -- The MAD Programmer -- 919-228-3313 (Cornet 291) alias: Curtis Jackson ...![ ihnp4 ulysses cbosgd clyde ]!burl!rcj