Path: utzoo!attcan!uunet!tektronix!tekcrl!tekchips!stevev
From: stevev@tekchips.CRL.TEK.COM (Steve Vegdahl)
Newsgroups: comp.lang.c
Subject: Re: YACC grammar for C language
Summary: ugly problems during the semantic phase
Keywords: YACC grammar for C language
Message-ID: <3453@tekcrl.CRL.TEK.COM>
Date: 11 Jan 89 18:36:50 GMT
References: <175@calmasd.GE.COM>
Sender: ftp@tekcrl.CRL.TEK.COM
Lines: 72

In article <175@calmasd.GE.COM>, jhh@calmasd.GE.COM (Jung Hamel) writes:
> 
>    Does anybody have or know of YACC grammar for C that does
> not require the lexical analyser to differentiate typedef
> names from other identifier names. We have tried to "fix" a
> grammar with this but always get an illegal grammar for YACC.
> Our lexical analyser does not have access to the full set of
> typedef names.

This is not as simple as is might seem.  Consider the following programs:

---------------- begin program 1 ----------------
typedef int foo;
int a, int b();

myFunc()
{
a*b();
}
---------------- end program 1 ----------------

---------------- begin program 2 ----------------
typedef int a;
int foo, int baz();

myFunc()
{
a*b();
}
---------------- end program 2 ----------------

Assuming that typedef names are treated as identifiers, these programs
are syntacticaly identical.  (The definitions of "myFunc" are
character-for-character identical.)  Yet, we really want to parse the
string "a*b();" differently in the two cases.  In program 1 it is a
multiplication of "a" by the result of the function call to "b", giving a
parse something like:
  STATEMENT
   EXPR
    EXPR
     ID: a
    OP
     STAR: *
    EXPR
     ID: b
   SEMI: ;
In program 2, we are declaring "b" to be a function returning a pointer to
something of type "a", giving a parse something like
  DECLLIST
   ID: a
   DECLNAMES
    DECLNAME
    STAR: *
    ID: b
   SEMI: ;

If we don't distinguish typedef names, say, in the scanner, we have no idea
which way to parse the string "a*b();", so we have to do something like
picking one of them, and possibly transforming the AST into other one once
we know whether "a" is a typedef---typically during semantic analysis.  (One
can probably construct examples that are even more complicated than this one.)

Thus, while it is possible to write a grammar C that treats identifiers as
typedefs, this can only be done by pushing more work onto the semantic
analysis phase.  (Of course in the degenerate we could write a parser that
accepted *any* token sequence, forcing the semantic analysis phase to do all
the parsing!)

		Steve Vegdahl
		Computer Research Lab
		Tektronix Labs
		Beaverton, Oregon