Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!convex!usenet
From: tchrist@convex.COM (Tom Christiansen)
Newsgroups: comp.lang.perl
Subject: Re: Counting RE occurrences
Keywords: regexps, /g, /n, split, arrays, APL, invert
Message-ID: <1991May17.132403.12104@convex.com>
Date: 17 May 91 13:24:03 GMT
References: <1991May13.184504.13844@demon.co.uk> <1991May13.225603.29819@convex.com> <1991May16.010149.20536@uunet.uu.net>
Sender: usenet@convex.com (news access account)
Reply-To: tchrist@convex.COM (Tom Christiansen)
Organization: CONVEX Software Development, Richardson, TX
Lines: 130
Nntp-Posting-Host: pixel.convex.com

From the keyboard of rbj@uunet.uu.net (Root Boy Jim):
:>:I have a string, which contains a piece of text. I also have a regular
:>:expression. I want to count the number of times the RE appears in the
:>:string. 
:
:Quite simply, the answer is: split(/RE/,exp) - 1;
:
:I don't know why Tom missed the easy answer after giving the hard ones.

Funny you should mention that.  Believe it or not, I just came back from
thinking about all this a bunch, and was about the post the split()
solution, and here you'd gone and beaten me to it.

There are a couple of problems in using split for this.  I think it
has more overhead than it needs to have.  If all you want is the
count of the exprs, there's no reason to go making all those @_ 
values that you'll be creating as a side-effect of the split.

If you could say:

    $count = /regexp/g;

it would not need to create all those values, and seems more intuitive.  

Of course, if you said:

    @array = /stuff (regexp)/g;

this is effectively the same as

    @array = grep($i++%2, split(/stuff (regexp)/));

except that once again, it's not utterly intuitive and will go making
more tmp values than it really needs to -- although only twice as many.

I've also been thinking more about 

    while (/foo/) {

and somehow making that an iterator that starts the match from where it
left off.  I think a decent way to do this would be to use a /n flag 
indicating "next match".  Thus the syntax would be //n, or m//n, not
n//.  It's really still a match, just with a special variation, so doesn't
particularly merit an entirely new operator.  Perl would keep a pointer
into the string being matched against, advancing it with each match
until it ran out.

    while ($foo =~ /bar/n) {

Is certainly one possibility, but another possible use would be:

    if (/foo/ && /bar/n && /baz/n)

which might be faster than 

    if (/foo.*bar.*baz/)


A question is what you do on failure.  For example, does this make sense:

    if (/foo/) {
	if (/bar/n) { } 
	elsif (/baz/n) { } 
    } 

If the /bar/n failed, could the /baz/n search start from the same place
as the /bar/n started?

Another question is when to reset your state.  Do you have to know when
the variable you're matching against has been written?  Do you reset
everytime the variable is matched against without the /n switch?  On
further contemplation, I think for efficiency you'd want to make the user
put in a /n if he ever wanted to do a next match.  Otherwise it'd be too
much overhead. That makes the above fragment like this:

    if (/foo/n) {
	if (/bar/n) { } 
	elsif (/baz/n) { } 
    } 

I still don't know when to reset the state.  And does /n make sense for
the s/// operator?

I think this /n switch needs a bit more thought and discussion, maybe from
some of you who've done more complex pattern operations in other languages.  

The /g switch, on the other hand, seems much more straight-forward and
could work just as I've described it above without shocking anyone.
Larry, what's your take on all this?

:>I know, I know... along that road lies APL and madness.
:
:Too late. Perl is already weirder than APL. Uglier too.

Oh good, does that mean we'll get

    @a += @b;
    @c = @a + @b;

one of these days then? :-)

Speaking of array operations, consider this.  You have an array
of colors and values, as from the perl man page:

    %map = ('red', 0x00f, 'blue', 0x0f0, 'green', 0xf00);

So that $map{'red'} == 0x00f and so on.  What if you want to 
invert the array so you can compute $map{0x00f}?  Well, certainly
you can do this semi-awkishly:

    for $color (keys %map) {
	$nmap{$map{$color} = $color;
    } 

or even this in a lispy, semi-mapcar kind of way:

    grep( $nmap{$map{$_}} = $_, keys %map ); 

but even grep is too much work.  I think the true perl idiom is 

    @nmap{values %map} = keys %map;

which works just fine, is quite obvious about what it's doing, and seems
more in line with the Perlian Way.  I don't believe I've ever seen anyone 
do that before.

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time."