Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!spool.mu.edu!uunet!rbj
From: rbj@uunet.uu.net (Root Boy Jim)
Newsgroups: comp.lang.perl
Subject: Re: Counting RE occurrences
Keywords: regexps, /g, /n, split, arrays, APL, invert
Message-ID: <1991May21.184545.26905@uunet.uu.net>
Date: 21 May 91 18:45:45 GMT
Article-I.D.: uunet.1991May21.184545.26905
References: <1991May13.225603.29819@convex.com> <1991May16.010149.20536@uunet.uu.net> <1991May17.132403.12104@convex.com>
Organization: UUNET Communications Services, Falls Church, VA
Lines: 123

tchrist@convex.COM (Tom Christiansen) writes:
?From the keyboard of rbj@uunet.uu.net (Root Boy Jim):
?:
?:Quite simply, the answer is: split(/RE/,exp) - 1;
?:
?There are a couple of problems in using split for this.  I think it
?has more overhead than it needs to have.  If all you want is the
?count of the exprs, there's no reason to go making all those @_ 
?values that you'll be creating as a side-effect of the split.

Yes. It's merely the most conceptually simplest.

?    $count = /regexp/g;
?
?it would not need to create all those values, and seems more intuitive.  

I second the notion. It says what it means.

?I've also been thinking more about 
?
?    while (/foo/) {
?
?and somehow making that an iterator that starts the match from where it
?left off.  I think a decent way to do this would be to use a /n flag 
?indicating "next match".  Thus the syntax would be //n, or m//n.

I am leery of these operators with embedded state. It's just
another thing that has to be cleaned up.

?    while ($foo =~ /bar/n) {
?
?Is certainly one possibility, but another possible use would be:
?
?    if (/foo/ && /bar/n && /baz/n)

This really bothers me. It is one thing for each textual operator
to save its own state, quite another to refer to someplace different
in the program. Yes, I can see that they match the same variable.

Consider the following program:

	while ($a=<*>) {
		if (++$c & 1)  {
			$b=<*>;
		} else {
			$b='';
		}
		print "$a\t$b\n";
	}

You can see that each operator retains its own state.
A closure if you will. In the m//n case, the remembered
position would have to be stored with the variable perhaps.

?which might be faster than 
?
?    if (/foo.*bar.*baz/)

But speed isn't everything.

?A question is what you do on failure.  For example, does this make sense:
?
?    if (/foo/) {
?	if (/bar/n) { } 
?	elsif (/baz/n) { } 
?    } 
?
?If the /bar/n failed, could the /baz/n search start from the same place
?as the /bar/n started?

Yes, but it takes awhile to figger that out.
Advance the pointer only on successful matches.

?I think this /n switch needs a bit more thought and discussion, maybe from
?some of you who've done more complex pattern operations in other languages.  

I think it should be killed right here.

I think we would need an explicit position argument.
Such a beast almost exists: index. If only it did RE's.

Then the code would be something like:

	for ($cnt=$pos=0; $pos=index($string,$RE,$pos); $pos+=length($&))
		{ $cnt++; }

?The /g switch, on the other hand, seems much more straight-forward and
?could work just as I've described it above without shocking anyone.
?Larry, what's your take on all this?
?
?:>I know, I know... along that road lies APL and madness.
?:
?:Too late. Perl is already weirder than APL. Uglier too.
?
?Oh good, does that mean we'll get
?
?    @a += @b;
?    @c = @a + @b;
?
?one of these days then? :-)

Not to mention @a += $b;

?or even this in a lispy, semi-mapcar kind of way:
?
?    grep( $nmap{$map{$_}} = $_, keys %map ); 
?
?but even grep is too much work.  I think the true perl idiom is 
?
?    @nmap{values %map} = keys %map;
?
?which works just fine, is quite obvious about what it's doing, and seems
?more in line with the Perlian Way.  I don't believe I've ever seen anyone 
?do that before.

LISP allows you to search an alist either way.
This is obviously better than two separate structures.

And I believe APL allows you to do the equivalent of "@a[1,3,5] = (1,9,25)".
However, APL doesn't have associative arrays.
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane