Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uunet!lotus!esegue!compilers-sender From: boehm@flora.rice.edu (Hans Boehm) Newsgroups: comp.compilers Subject: right context in scanner generators (was Re: LEX and YACC) Summary: The standard algorithm is buggy Message-ID: <1989Nov12.041025.12451@esegue.segue.boston.ma.us> Date: 12 Nov 89 04:10:25 GMT References: <1989Nov11.161355.10081@esegue.segue.boston.ma.us> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: boehm@flora.rice.edu (Hans Boehm) Organization: Rice University, Houston Lines: 31 Approved: compilers@esegue.segue.boston.ma.us A number of scanner generators, including lex, make the claim that they can handle regular expressions specifying right context. However, as was pointed out to me by a student in a compiler class I was teaching, the implementation that lex uses (namely that described in the Aho and Ullman text) is wrong. (I believe the same bug exists in Aho, Sethi, and Ullman, but I don't have a copy handy at the moment.) An example is: (a|ab)/ba on input "aba". Lex will consider "ab" to be the first token instead of "a". (It first looks for the concatenation of both regular expressions, then back- tracks to the rightmost occurrence of a final state corresponding to the first regular expression.) Are there any scanner generators that actually do this right? I believe a correct solution is to build a finite automaton for the reversal of the second regular expression, and run it during the backtracking process. This is not asymptotically worse than the standard (incorrect) algorithm, but it is more complicated than I would like. Hans-J. Boehm boehm@rice.edu [How about stepping across the matched token, trying to match the second expression; when it matches all the way to the end you've found the right context. It's never been clear to me how useful lex-style right context is. Anyone have some real examples? -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.