Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!snorkelwacker!bloom-beacon!bloom-beacon!lfk
From: lfk@athena.mit.edu (Lee F Kolakowski)
Newsgroups: comp.lang.perl
Subject: FSM's for large regex's in Perl or Awk
Message-ID: <1990Jul5.225930.13089@athena.mit.edu>
Date: 5 Jul 90 22:58:51 GMT
Sender: news@athena.mit.edu (News system)
Distribution: comp
Organization: Mass. Inst. of Tech., Dept. of Chemistry
Lines: 62


Hello out there in the land of perls.

I am thinking of moving an application from awk to perl, but wonder
what kind of speed up is possible and maybe of I am make a
pseudo-finite state machine out of problem.

I have a file with 350 regexs, which is expected to grow to 500 or
more by years end.

I want to search all of these on the input line. Currently, I read the
regexs into an array, and loop through all the regexs doing a match
each time. If it matches (I know where from the RSTART var), I cut off
the input up to that point and try match again.

so my code looks like this:

{
  if (FILENAME == "regex file") {
    accession[NR] = $1 ; regex[NR] = $2; name[NR] = $3; regex_num = NR;
  }
  else { 
    n = length($0)
    for (i = 1; i <= regex_num ; i++ ) {
      if (match($0, regex[i])) {

#	if it matches do stuff, then cut off the 
#	begining of string up to match and try again

	while (match(new,regex[i])) {

#	do some of the same stuff above

	}
      }
    }
  }
}


So the question is can I build a fsm using perl for this kind of
thing??

Wondering....


--

Frank Kolakowski 

======================================================================
|lfk@athena.mit.edu                     ||      Lee F. Kolakowski    |
|lfk@eastman2.mit.edu                   ||	M.I.T.		     |
|kolakowski@wccf.mit.edu                ||	Dept of Chemistry    |
|lfk@mbio.med.upenn.edu		        ||	Room 18-506	     |
|lfk@hx.lcs.mit.edu                     ||	77 Massachusetts Ave.|
|AT&T:  1-617-253-1866                  ||	Cambridge, MA 02139  |
|--------------------------------------------------------------------|
|                         #include <woes.h>         		     |
|		           One-Liner Here!                           |
======================================================================