Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!snorkelwacker!bloom-beacon!bloom-beacon!lfk From: lfk@athena.mit.edu (Lee F Kolakowski) Newsgroups: comp.lang.perl Subject: FSM's for large regex's in Perl or Awk Message-ID: <1990Jul5.225930.13089@athena.mit.edu> Date: 5 Jul 90 22:58:51 GMT Sender: news@athena.mit.edu (News system) Distribution: comp Organization: Mass. Inst. of Tech., Dept. of Chemistry Lines: 62 Hello out there in the land of perls. I am thinking of moving an application from awk to perl, but wonder what kind of speed up is possible and maybe of I am make a pseudo-finite state machine out of problem. I have a file with 350 regexs, which is expected to grow to 500 or more by years end. I want to search all of these on the input line. Currently, I read the regexs into an array, and loop through all the regexs doing a match each time. If it matches (I know where from the RSTART var), I cut off the input up to that point and try match again. so my code looks like this: { if (FILENAME == "regex file") { accession[NR] = $1 ; regex[NR] = $2; name[NR] = $3; regex_num = NR; } else { n = length($0) for (i = 1; i <= regex_num ; i++ ) { if (match($0, regex[i])) { # if it matches do stuff, then cut off the # begining of string up to match and try again while (match(new,regex[i])) { # do some of the same stuff above } } } } } So the question is can I build a fsm using perl for this kind of thing?? Wondering.... -- Frank Kolakowski ====================================================================== |lfk@athena.mit.edu || Lee F. Kolakowski | |lfk@eastman2.mit.edu || M.I.T. | |kolakowski@wccf.mit.edu || Dept of Chemistry | |lfk@mbio.med.upenn.edu || Room 18-506 | |lfk@hx.lcs.mit.edu || 77 Massachusetts Ave.| |AT&T: 1-617-253-1866 || Cambridge, MA 02139 | |--------------------------------------------------------------------| | #include | | One-Liner Here! | ======================================================================