Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!shadooby!samsung!usc!snorkelwacker!spdcc!ima!esegue!compilers-sender
From: peterd@cs.washington.edu (Peter C. Damron)
Newsgroups: comp.compilers
Subject: Re: right context in scanner generators (was Re: LEX and YACC)
Summary: here's an idea for right context in lex
Message-ID: <1989Nov13.203142.1842@esegue.segue.boston.ma.us>
Date: 13 Nov 89 20:31:42 GMT
References: <1989Nov11.161355.10081@esegue.segue.boston.ma.us> <1989Nov12.041025.12451@esegue.segue.boston.ma.us> <1989Nov12.223642.14219@esegue.segue.boston.ma.us>
Sender: compilers-sender@esegue.segue.boston.ma.us
Reply-To: peterd@cs.washington.edu (Peter C. Damron)
Organization: University of Washington, Computer Science, Seattle
Lines: 42
Approved: compilers@esegue.segue.boston.ma.us

In article <1989Nov12.223642.14219@esegue.segue.boston.ma.us> boehm@flora.rice.edu (Hans Boehm) writes:
>  Trying to match the context expression separately is, at least in some
>theoretical sense, not very efficient.  If there is more than one possible
>regular expression that may match, it may be necessary to look for several
>different right contexts at several different starting positions.  This would
>involve backtracking within the recognition of a single token.  The current
>lex algorithm has the property that it needs to only simulate a single finite
>automaton, independent of the number of regular expressions in the
>specification, and independent of whether or not they include right context.
>It does need to back up in the input at the end of every token recognition to
>find the longest match, but this only happens once per token.  It's a simple
>scan to find the last time a the FA was in a final state.  Right context is
>implemented by filtering out certain final states. 

I don't know how lex matches right context, but here is my suggestion.

1.  Take all the regular expressions for a given meta-state, and tack on
    the right context, remembering where the division is.
2.  Build the finite automaton which has two types of final states,
    one for the end of a pattern and one for the end of a right context.
3.  Run the scanner over an input string until the automata hits an error
    state.  During the scan, push all the final states reached onto a stack
    along with the corresponding string position.
4.  Examine the stack to find the first (longest) end-of-pattern final state
    that also has a corresponding end-of-right-context final state.
5.  Restart the scan at the position of the end of pattern final state.

There is no need to build a reverse string matcher.  You do have to re-scan
the right context of the matching pattern.  However, you may be able to
build the automata in such a way that you can figure out the new final state
stack and the new starting state without any re-scan.  I have no idea how
difficult that might be to compute.

Hope this helps (and works),
--
Peter C. Damron, Dept. of Computer Science, FR-35
University of Washington, Seattle, WA  98195
peterd@cs.washington.edu, {ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.