Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!sun!plx!evan
From: evan@plx.UUCP (Evan Bigall)
Newsgroups: comp.unix.questions
Subject: Re: YACC question
Message-ID: <2210@plx.UUCP>
Date: 16 Jan 90 00:31:28 GMT
References: <8ZgGBgG00VsnQDgUNO@andrew.cmu.edu>
Reply-To: evan@plx.UUCP (Evan Bigall)
Organization: Plexus Computers; San Jose, CA
Lines: 48

>
>    expr:       mulexpr PLUS mulexpr
>        | mulexpr MINUS mulexpr
>
>It's very straightforward; the yylex() routine must be written to return
>the constant PLUS when it encounters a '+' in the input, and the
>constant MINUS when it encounters a '-' in the input.  However, Yacc
>allows you to rewrite the above fragment as
>
>    expr:       mulexpr '+' mulexpr
>        | mulexpr '-' mulexpr
>
>My question is, where does Yacc find the '+' and the '-' characters? 
>Apparently they're not gotten via a call to yylex().  Does Yacc simply
>do a getchar()?

Quoting from the yacc section of my sys5.2 "Suport Tool Guide":

}	The rules section is made up of one or more grammar rules.  A grammar
}rule has the form 
}
}A : BODY ;
}
}where "A" represents a nonterminal name, and "BODY" represents a sequence of
}zero or more names and LITERALS {my emphasis}.  The colon and the semicolon
}are yacc punctuation. 

{later it says:}

}A literal consists of a character enclosed in single quotes (').  As in C
}language, the backslash (\) is an escape character within literals....

Really all that is going on here is that yacc is using the value of the
character literal as the token number.  This is why the yacc generated token
numbers start at 257 (on machines with ""normal"" char sets).

The standard way to represent this as a lex rule is:

.                      	return(*yytext);

to return a literal for all charcters not recognized by another rule. 

Evan


-- 
Evan Bigall, Plexus Software, Santa Clara CA (408)982-4840  ...!sun!plx!evan
"I barely have the authority to speak for myself, certainly not anybody else"