Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!convex!usenet From: tchrist@convex.COM (Tom Christiansen) Newsgroups: comp.lang.perl Subject: Re: Counting RE occurrences Keywords: regexps, /g, /n, split, arrays, APL, invert Message-ID: <1991May17.132403.12104@convex.com> Date: 17 May 91 13:24:03 GMT References: <1991May13.184504.13844@demon.co.uk> <1991May13.225603.29819@convex.com> <1991May16.010149.20536@uunet.uu.net> Sender: usenet@convex.com (news access account) Reply-To: tchrist@convex.COM (Tom Christiansen) Organization: CONVEX Software Development, Richardson, TX Lines: 130 Nntp-Posting-Host: pixel.convex.com From the keyboard of rbj@uunet.uu.net (Root Boy Jim): :>:I have a string, which contains a piece of text. I also have a regular :>:expression. I want to count the number of times the RE appears in the :>:string. : :Quite simply, the answer is: split(/RE/,exp) - 1; : :I don't know why Tom missed the easy answer after giving the hard ones. Funny you should mention that. Believe it or not, I just came back from thinking about all this a bunch, and was about the post the split() solution, and here you'd gone and beaten me to it. There are a couple of problems in using split for this. I think it has more overhead than it needs to have. If all you want is the count of the exprs, there's no reason to go making all those @_ values that you'll be creating as a side-effect of the split. If you could say: $count = /regexp/g; it would not need to create all those values, and seems more intuitive. Of course, if you said: @array = /stuff (regexp)/g; this is effectively the same as @array = grep($i++%2, split(/stuff (regexp)/)); except that once again, it's not utterly intuitive and will go making more tmp values than it really needs to -- although only twice as many. I've also been thinking more about while (/foo/) { and somehow making that an iterator that starts the match from where it left off. I think a decent way to do this would be to use a /n flag indicating "next match". Thus the syntax would be //n, or m//n, not n//. It's really still a match, just with a special variation, so doesn't particularly merit an entirely new operator. Perl would keep a pointer into the string being matched against, advancing it with each match until it ran out. while ($foo =~ /bar/n) { Is certainly one possibility, but another possible use would be: if (/foo/ && /bar/n && /baz/n) which might be faster than if (/foo.*bar.*baz/) A question is what you do on failure. For example, does this make sense: if (/foo/) { if (/bar/n) { } elsif (/baz/n) { } } If the /bar/n failed, could the /baz/n search start from the same place as the /bar/n started? Another question is when to reset your state. Do you have to know when the variable you're matching against has been written? Do you reset everytime the variable is matched against without the /n switch? On further contemplation, I think for efficiency you'd want to make the user put in a /n if he ever wanted to do a next match. Otherwise it'd be too much overhead. That makes the above fragment like this: if (/foo/n) { if (/bar/n) { } elsif (/baz/n) { } } I still don't know when to reset the state. And does /n make sense for the s/// operator? I think this /n switch needs a bit more thought and discussion, maybe from some of you who've done more complex pattern operations in other languages. The /g switch, on the other hand, seems much more straight-forward and could work just as I've described it above without shocking anyone. Larry, what's your take on all this? :>I know, I know... along that road lies APL and madness. : :Too late. Perl is already weirder than APL. Uglier too. Oh good, does that mean we'll get @a += @b; @c = @a + @b; one of these days then? :-) Speaking of array operations, consider this. You have an array of colors and values, as from the perl man page: %map = ('red', 0x00f, 'blue', 0x0f0, 'green', 0xf00); So that $map{'red'} == 0x00f and so on. What if you want to invert the array so you can compute $map{0x00f}? Well, certainly you can do this semi-awkishly: for $color (keys %map) { $nmap{$map{$color} = $color; } or even this in a lispy, semi-mapcar kind of way: grep( $nmap{$map{$_}} = $_, keys %map ); but even grep is too much work. I think the true perl idiom is @nmap{values %map} = keys %map; which works just fine, is quite obvious about what it's doing, and seems more in line with the Perlian Way. I don't believe I've ever seen anyone do that before. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "So much mail, so little time."