Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!wuarchive!udel!haven!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.lang.c Subject: Re: yacc sorrows Message-ID: <22529@mimsy.umd.edu> Date: 14 Feb 90 19:39:27 GMT References: <7179@arcturus> <1990Feb9.171557.18465@tcsc3b2.tcsc.com> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 69 (Incidentally, this is another thing that does not really belong in comp.lang.c, but in this case there *is* no appropriate group, so I have not attempted to redirect followups....) A few minor points: In article <1990Feb9.171557.18465@tcsc3b2.tcsc.com> prs@tcsc3b2.tcsc.com (Paul Stath) writes: >The string that gets matched in LEX is stored in a character pointer called >`yytext'. Actually, this is an array (of size YYLMAX, typically 200) of characters, not a pointer. [example lex code] >${alpha}{alphanum}* { > yylval.str=malloc(strlen(yytext)+1); > strcpy(yylval.str, yytext); > return (Identifier); > } It is not actually necessary to call malloc() here, as the characters in yytext[] will be left undisturbed until the next call to yylex(). The string saving, if necessary, can be deferred to the parser. One useful trick is to have a parse rule like save_id: %type save_id %token ID %% save_id: ID { $$ = savestr($1); }; Then, whenever you need an ID that must be saved from destruction by the next call to yylex(), you can use save_id instead of ID. Another different trick (which I have used in some hand-coded lexers) is to save all strings in hash tables, possibly reference counted (depending on whether many should be freed later). In any case, a routine that calls malloc() should check for no-space: instead of yylval.str=malloc(strlen(yytext)+1); strcpy(yylval.str, yytext); you need something like yylval.str = malloc(strlen(yytext) + 1); if (yylval.str == NULL) die_horribly_due_to_running_out_of_space(); strcpy(yylval.str, yytext); or more simply yylval.str = estrdup(yytext); where estrdup is like strdup, but errors out if out of space. (strdup is a common library function that acts like malloc+strcpy, returning NULL if out of space.) >LEX and YACC are powerful tools which IMHO are poorly documented. The real documentation for both of these tools is found in compiler courses and in compiler textbooks, not in the supplementary Unix documents. The latter assume you know what LALR parsing and regular expressions are all about, and merely tell you how to tell yacc and lex what syntax rules and regular expressions to recognise, and what actions to take on recognition. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris