Xref: utzoo comp.compilers:213 comp.lang.c:8342 comp.unix.questions:6139 Path: utzoo!mnetor!uunet!husc6!necntc!ima!johnl From: rsalz@BBN.COM (Rich Salz) Newsgroups: comp.compilers,comp.lang.c,comp.unix.questions Subject: Re: LEX behaviour when given "large" automata. Message-ID: <914@ima.ISC.COM> Date: 18 Mar 88 16:45:26 GMT References: <911@ima.ISC.COM> Sender: johnl@ima.ISC.COM Reply-To: Rich Salz Organization: BBN Laboratories, Cambridge MA Lines: 40 Approved: compilers@ima.UUCP In comp.compilers (<911@ima.ISC.COM>), phs@lifia.imag.fr (Philippe Schnoebelen) writes: > I'm having some problems with LEX. When my number of keywords/regexps is >growing, the lexical analyzer begins to give strange, unexpected, (let's >face it, wrong) results. Lex ain't robust. As a work-around, you can get real big savings in all of 1. creation time 2. running time 3. exectuable size by going from one pattern/keyword, to a general pattern for identifiers, and doing a table lookup. That is, don't do this: for return(FOR); if return(IF); foo return(FOO); [a-z]+ return(munchonit(yytext)); Do this: table[] = { { "for", FOR }, { "if", IF }, { NULL } }; [a-z]+ { for (p = table; p->name; p++) if (strcmp(p->name, yytext) == 0) return(p->value); return(munchonit(yytext)); } (I left out all sort of declarations and optimizations on the search loop.) This is a real fun thing to do: how often do you get to win on both sides of the time-space tradeoff? /r$ [Similar suggestion from Eddie Wyatt edw@ius1.cs.cmu.edu] [From Rich Salz ] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request