Path: utzoo!attcan!uunet!samsung!munnari.oz.au!metro!cluster!ultima!hades!greyham From: greyham@hades.OZ (Greyham Stoney) Newsgroups: comp.lang.perl Subject: How can I scan one file with a list of RE's from another efficently?. Message-ID: <694@hades.OZ> Date: 24 May 90 02:44:46 GMT Sender: greyham@hades.OZ Organization: Ausonics Pty Ltd, Sydney, Australia Lines: 102 A question for all ye perl hackers: Our site recieves a subset of a full news feed. I'd like to generate a report based on the checkgroups message and/or the newsgroups file to say which groups we actually recieve, and which groups we don't. Groups that we don't recieve are listed in a file named "dropped", which is actually a list of regular expressions (bionet.*, vmsnet.* for example) which match all the groups we don't want. I want the report to spit out the checkgroups message in two groups; newgroups we get, and newsgroups we don't get. So it basically scans the checkgroups message with a list of Regular Expressions from another file. I've put together a perl script which does the job, but BOY is it slow!; I imagine becuase it has to compile that RE thousands of times. Can anyone of severe perl guru wizard status suggest a better way of doing it? [ doesn't have to use perl, I'm easy ]. I could just use fgrep -f, but the list of groups dropped is too long for it to handle, and they're RE's anyway. thanks, Greyham. --------------------------------- CUT HERE ------------------------------ #!/usr/local/bin/perl # provide a report (from checkgroups) as to what newsgroups we still get, # and what ones we don't get. # slurp in the 'dropped' file. open(DROPPED, 'dropped'); @dropped = ; close(DROPPED); chop (@dropped); # nuke the \n off the end of each line. # print it, just for checking. #print @dropped; #print $#dropped; # slurp in the 'checkgroups' file (it's a news article). open(CHECKGROUPS,'checkgrps.msg'); # skip the header business. while () { if (/^$/) { last; } } @checkgroups = ; close (CHECKGROUPS); # print it, just for checking. #print @checkgroups; #print $#checkgroups; # go down each message in the checkgroups, and find whether we get it or not. for ($group = 0; $group <= $#checkgroups; $group++) { #print $checkgroups[$group], "\n"; # see if this group is matched by anything in dropped. for ($drop = 0; $drop <= $#dropped; $drop++) { #print $checkgroups[$group], $dropped[$drop],"\n"; if ($checkgroups[$group] =~ /^$dropped[$drop]/) { $nogo[$group] = 1; last; } } } print "***** The Following is a list of groups that we DO get:\n"; # spin down saying what we DO get: for ($group = 0; $group <= $#checkgroups; $group++) { if (!$nogo[$group]) { print $checkgroups[$group]; } } print "\n\n\n***** The Following is a list of groups that we DO NOT get:\n"; # spin down saying what we DONT get: for ($group = 0; $group <= $#checkgroups; $group++) { if ($nogo[$group]) { print $checkgroups[$group]; } } --------------------------------- CUT HERE ------------------------------- -- /* Greyham Stoney: Australia: (02) 428 6476 * * greyham@hades.oz - Ausonics Pty Ltd, Lane Cove, Sydney, Oz. */