Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!usc!elroy.jpl.nasa.gov!jpl-devvax!lwall From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Newsgroups: comp.lang.perl Subject: Re: Weird grep bug ? (perl 3.0 pl8) Message-ID: <6839@jpl-devvax.JPL.NASA.GOV> Date: 19 Jan 90 05:33:37 GMT References: <845@frankland-river.aaii.oz.au> Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Organization: Jet Propulsion Laboratory, Pasadena, CA Lines: 93 In article <845@frankland-river.aaii.oz.au> pem@frankland-river.aaii.oz.au (pem) writes: : I have noticed what seems to be a strange bug in grep. : I am not quite sure what is going on -- the problem seems only to : occur when I use a 'do ' statement to include a header. : : I was wondering if anyone else has seen this something like this before. : : Here is a small program which demonstrates the problem on my machine: : (a sun 3/60 running perl 3.0 pl8) : : All it does is use the built in grep command to match at the start of : an array of lines, returning a new array of matching lines. : If I include 'getopts.pl', for example, the grep (wrongly) matches : every line. Otherwise it works as I would have expected. : : -----------cut here----------- : #!/usr/bin/perl : : # comment the next line and the program behaves fine! : do 'getopts.pl'; : : $" = "\n"; : : @dump_lines = ( : "f:/, 0", "f:/usr, 1", : "h:/, 3", "h:/usr, 4", : "f:/, 5", "f:/usr, 6", : "h:/, 7", "h:/usr, 8" : ); : : for (;;) { : print "\n(note: if you type 'f' you would expect to get only 4 matching lines)\n"; : print "Which dump ? (name eg. 'f' or '?' or 'q' to quit) "; : chop($_ = ); : if (/^[Qq]$/) {exit 0;} : elsif (/^\?$/) {print("@dump_lines"); next;} : /^\s*(\S+)/; : @entry = grep(/^$1/, @dump_lines); : print "found the following entries in the log:\n@entry\n"; : } This is a subtle little semantic difficulty caused by an optimization. The immediate cause of your problem is the use of $1 inside a pattern that may invalidate the meaning of $1. It's always a little dangerous to do that sort of thing, especially on a pattern that's evaluated more than once. What if $1 contained ()? In this case, there's no () in $1, so ordinarily you'd get away with it. But the decision in pattern matching whether to remember a new $1, $2, etc is tied (currently, anyway) to whether it will remember $&, $` and $'. (The offsets for returning these are actually kept in retrieval info for $0, of all places. Which is why $0 gets clobbered by pattern matches. Someday I'll fix that.) Anyway, if perl sees a $&, $` or $' anywhere in your program, it assumes that it has to recreate $0, $1, etc. But wait, you say, those variables don't occur, even in getopt.pl. True. But if the program contains an eval, perl has to assume that a lot of variables might be there that it hasn't seen yet. Now "do FILENAME" is a kind of eval. So when you included that line, perl had to initialize space for $& et al. And because it did that, the /^$1/ figured it had to set up for $& et all to return the correct info. So it clobbered $1. Cute, eh? In the ordinary run of things, you'd get away with that, even so, because the old $1 would be interpolated before the pattern (a run-time pattern) was compiled. But grep evaluates its first argument repeatedly, and since it's a run-time pattern, it has to recompile the pattern, so on the second array element, $1 is no longer valid. The obvious quick fix is to make perl reset the scope to the outer pattern match before each iteration of grep. In fact, that'll be in patch 9. The obvious quick workaround is to put $1 into a temp variable and interpolate that: ($which) = /^\s*(\S+)/; @entry = grep(/^$which/, @dump_lines); That's probably more readable anyway. If you know the grep is only going to happen once, it would behoove you to add an 'o' modifier to avoid unnecessary recompilations of the pattern. But since you have it in a loop, the possibility remains that you might want to change it. If you were going to be grepping many things, it might be more efficient to use the 'o' modifier inside an eval: eval '@entry = grep(/^$which/o, @dump_lines)'; This just compiles the pattern once for the grep, but recompiles each time the grep is run. Of course, for a small list, who cares. Larry