Xref: utzoo comp.compilers:215 comp.lang.c:8404 comp.unix.questions:6162 Path: utzoo!utgpu!water!watmath!clyde!ima!johnl From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.compilers,comp.lang.c,comp.unix.questions Subject: Re: LEX behaviour when given "large" automata. Message-ID: <10723@mimsy.UUCP> Date: 20 Mar 88 05:30:17 GMT References: <911@ima.ISC.COM> <914@ima.ISC.COM> Sender: johnl@ima.ISC.COM Reply-To: chris@mimsy.UUCP (Chris Torek) Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 63 Approved: compilers@ima.UUCP Several people suggested replacing lex description such as if return (IF); then return (THEN); else return (ELSE); {id} { symenter(yytext); return (ID); } with {id} { /* first try it as a keyword */ k = keyword_index(yytext); if (k >= 0) return (k); /* not a keyword, must be an identifier */ symenter(yytext); return (ID); } This may (probably does) help in lex, but it is clearly the wrong way around in almost all cases. The lexer has to traverse a DFA to decide that something is an ID. While it is at it it could easily tell whether it is a keyword instead. Obviously this makes the DFA table larger, but the result should run faster. It turns out (as Van Jacobson discovered) that lex uses a long subroutine---on the order of 200 lines---to do what is essentially described by the loop /* follow the DFA */ for (state = begin; state[c].v == c; c = *in++) state += state[c].n; (best case) or /* * Pick the right start state depending on whether we are at * the beginning of a line. */ state = newline ? hat_begin : begin; while (state[c].v == c) { /* follow the DFA */ state += state[c].n; c = *in++; /* remember action state (for right context) */ if (state->v) { a_s = state; a_c = in; } } in = a_c; if (a_s->n) { in -= a_s->n; /* back up over right context */ c = in[-1]; } newline = in[-2]=='\n'; /* remember whether we are now at ^ */ in[-1] = '\0'; /* kill residual */ (worst case). This pretty much handles everything except YYREJECT. I do not know whether the LBL `flex' (Fast LEX) handles YYREJECT now. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request