Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uunet!snorkelwacker!spdcc!ima!esegue!compilers-sender From: vern@cs.cornell.edu (Vern Paxson) Newsgroups: comp.compilers Subject: Re: right context in scanner generators (was Re: LEX and YACC) Message-ID: <1989Nov13.175316.466@esegue.segue.boston.ma.us> Date: 13 Nov 89 17:53:16 GMT References: <1989Nov11.161355.10081@esegue.segue.boston.ma.us> <1989Nov12.041025.12451@esegue.segue.boston.ma.us> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: vern@cs.cornell.edu (Vern Paxson) Organization: Cornell Univ. CS Dept, Ithaca NY Lines: 32 Approved: compilers@esegue.segue.boston.ma.us In article <1989Nov12.041025.12451@esegue.segue.boston.ma.us> boehm@flora.rice.edu (Hans Boehm) writes: >A number of scanner generators, including lex, make the claim that >they can handle regular expressions specifying right context. >However, as was pointed out to me by a student in a compiler class >I was teaching, the implementation that lex uses (namely that described >in the Aho and Ullman text) is wrong. (I believe the same bug exists >in Aho, Sethi, and Ullman, but I don't have a copy handy at the moment.) >An example is: > >(a|ab)/ba > >on input "aba". Lex will consider "ab" to be the first token instead of "a". >... >Are there any scanner generators that actually >do this right? Note that flex kind of does this correctly--it allows x/y whenever it can get it right, i.e., if either x or y have fixed length or if both are variable length and the end of x will be unambiguous. It will correctly match the above example. If the example is changed to "(a|ab)/ba*" then it warns "Dangerous trailing context in rule at line ". Vern Vern Paxson vern@cs.cornell.edu Computer Science Dept. decvax!cornell!vern Cornell University vern@LBL (bitnet) [Vern is the author of flex. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.