Path: utzoo!attcan!uunet!tektronix!tekcrl!tekchips!stevev From: stevev@tekchips.CRL.TEK.COM (Steve Vegdahl) Newsgroups: comp.lang.c Subject: Re: YACC grammar for C language Summary: ugly problems during the semantic phase Keywords: YACC grammar for C language Message-ID: <3453@tekcrl.CRL.TEK.COM> Date: 11 Jan 89 18:36:50 GMT References: <175@calmasd.GE.COM> Sender: ftp@tekcrl.CRL.TEK.COM Lines: 72 In article <175@calmasd.GE.COM>, jhh@calmasd.GE.COM (Jung Hamel) writes: > > Does anybody have or know of YACC grammar for C that does > not require the lexical analyser to differentiate typedef > names from other identifier names. We have tried to "fix" a > grammar with this but always get an illegal grammar for YACC. > Our lexical analyser does not have access to the full set of > typedef names. This is not as simple as is might seem. Consider the following programs: ---------------- begin program 1 ---------------- typedef int foo; int a, int b(); myFunc() { a*b(); } ---------------- end program 1 ---------------- ---------------- begin program 2 ---------------- typedef int a; int foo, int baz(); myFunc() { a*b(); } ---------------- end program 2 ---------------- Assuming that typedef names are treated as identifiers, these programs are syntacticaly identical. (The definitions of "myFunc" are character-for-character identical.) Yet, we really want to parse the string "a*b();" differently in the two cases. In program 1 it is a multiplication of "a" by the result of the function call to "b", giving a parse something like: STATEMENT EXPR EXPR ID: a OP STAR: * EXPR ID: b SEMI: ; In program 2, we are declaring "b" to be a function returning a pointer to something of type "a", giving a parse something like DECLLIST ID: a DECLNAMES DECLNAME STAR: * ID: b SEMI: ; If we don't distinguish typedef names, say, in the scanner, we have no idea which way to parse the string "a*b();", so we have to do something like picking one of them, and possibly transforming the AST into other one once we know whether "a" is a typedef---typically during semantic analysis. (One can probably construct examples that are even more complicated than this one.) Thus, while it is possible to write a grammar C that treats identifiers as typedefs, this can only be done by pushing more work onto the semantic analysis phase. (Of course in the degenerate we could write a parser that accepted *any* token sequence, forcing the semantic analysis phase to do all the parsing!) Steve Vegdahl Computer Research Lab Tektronix Labs Beaverton, Oregon