Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!shadooby!samsung!usc!snorkelwacker!spdcc!ima!esegue!compilers-sender From: peterd@cs.washington.edu (Peter C. Damron) Newsgroups: comp.compilers Subject: Re: right context in scanner generators (was Re: LEX and YACC) Summary: here's an idea for right context in lex Message-ID: <1989Nov13.203142.1842@esegue.segue.boston.ma.us> Date: 13 Nov 89 20:31:42 GMT References: <1989Nov11.161355.10081@esegue.segue.boston.ma.us> <1989Nov12.041025.12451@esegue.segue.boston.ma.us> <1989Nov12.223642.14219@esegue.segue.boston.ma.us> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: peterd@cs.washington.edu (Peter C. Damron) Organization: University of Washington, Computer Science, Seattle Lines: 42 Approved: compilers@esegue.segue.boston.ma.us In article <1989Nov12.223642.14219@esegue.segue.boston.ma.us> boehm@flora.rice.edu (Hans Boehm) writes: > Trying to match the context expression separately is, at least in some >theoretical sense, not very efficient. If there is more than one possible >regular expression that may match, it may be necessary to look for several >different right contexts at several different starting positions. This would >involve backtracking within the recognition of a single token. The current >lex algorithm has the property that it needs to only simulate a single finite >automaton, independent of the number of regular expressions in the >specification, and independent of whether or not they include right context. >It does need to back up in the input at the end of every token recognition to >find the longest match, but this only happens once per token. It's a simple >scan to find the last time a the FA was in a final state. Right context is >implemented by filtering out certain final states. I don't know how lex matches right context, but here is my suggestion. 1. Take all the regular expressions for a given meta-state, and tack on the right context, remembering where the division is. 2. Build the finite automaton which has two types of final states, one for the end of a pattern and one for the end of a right context. 3. Run the scanner over an input string until the automata hits an error state. During the scan, push all the final states reached onto a stack along with the corresponding string position. 4. Examine the stack to find the first (longest) end-of-pattern final state that also has a corresponding end-of-right-context final state. 5. Restart the scan at the position of the end of pattern final state. There is no need to build a reverse string matcher. You do have to re-scan the right context of the matching pattern. However, you may be able to build the automata in such a way that you can figure out the new final state stack and the new starting state without any re-scan. I have no idea how difficult that might be to compute. Hope this helps (and works), -- Peter C. Damron, Dept. of Computer Science, FR-35 University of Washington, Seattle, WA 98195 peterd@cs.washington.edu, {ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.