Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!spool.mu.edu!uunet!rbj From: rbj@uunet.uu.net (Root Boy Jim) Newsgroups: comp.lang.perl Subject: Re: Counting RE occurrences Keywords: regexps, /g, /n, split, arrays, APL, invert Message-ID: <1991May21.184545.26905@uunet.uu.net> Date: 21 May 91 18:45:45 GMT Article-I.D.: uunet.1991May21.184545.26905 References: <1991May13.225603.29819@convex.com> <1991May16.010149.20536@uunet.uu.net> <1991May17.132403.12104@convex.com> Organization: UUNET Communications Services, Falls Church, VA Lines: 123 tchrist@convex.COM (Tom Christiansen) writes: ?From the keyboard of rbj@uunet.uu.net (Root Boy Jim): ?: ?:Quite simply, the answer is: split(/RE/,exp) - 1; ?: ?There are a couple of problems in using split for this. I think it ?has more overhead than it needs to have. If all you want is the ?count of the exprs, there's no reason to go making all those @_ ?values that you'll be creating as a side-effect of the split. Yes. It's merely the most conceptually simplest. ? $count = /regexp/g; ? ?it would not need to create all those values, and seems more intuitive. I second the notion. It says what it means. ?I've also been thinking more about ? ? while (/foo/) { ? ?and somehow making that an iterator that starts the match from where it ?left off. I think a decent way to do this would be to use a /n flag ?indicating "next match". Thus the syntax would be //n, or m//n. I am leery of these operators with embedded state. It's just another thing that has to be cleaned up. ? while ($foo =~ /bar/n) { ? ?Is certainly one possibility, but another possible use would be: ? ? if (/foo/ && /bar/n && /baz/n) This really bothers me. It is one thing for each textual operator to save its own state, quite another to refer to someplace different in the program. Yes, I can see that they match the same variable. Consider the following program: while ($a=<*>) { if (++$c & 1) { $b=<*>; } else { $b=''; } print "$a\t$b\n"; } You can see that each operator retains its own state. A closure if you will. In the m//n case, the remembered position would have to be stored with the variable perhaps. ?which might be faster than ? ? if (/foo.*bar.*baz/) But speed isn't everything. ?A question is what you do on failure. For example, does this make sense: ? ? if (/foo/) { ? if (/bar/n) { } ? elsif (/baz/n) { } ? } ? ?If the /bar/n failed, could the /baz/n search start from the same place ?as the /bar/n started? Yes, but it takes awhile to figger that out. Advance the pointer only on successful matches. ?I think this /n switch needs a bit more thought and discussion, maybe from ?some of you who've done more complex pattern operations in other languages. I think it should be killed right here. I think we would need an explicit position argument. Such a beast almost exists: index. If only it did RE's. Then the code would be something like: for ($cnt=$pos=0; $pos=index($string,$RE,$pos); $pos+=length($&)) { $cnt++; } ?The /g switch, on the other hand, seems much more straight-forward and ?could work just as I've described it above without shocking anyone. ?Larry, what's your take on all this? ? ?:>I know, I know... along that road lies APL and madness. ?: ?:Too late. Perl is already weirder than APL. Uglier too. ? ?Oh good, does that mean we'll get ? ? @a += @b; ? @c = @a + @b; ? ?one of these days then? :-) Not to mention @a += $b; ?or even this in a lispy, semi-mapcar kind of way: ? ? grep( $nmap{$map{$_}} = $_, keys %map ); ? ?but even grep is too much work. I think the true perl idiom is ? ? @nmap{values %map} = keys %map; ? ?which works just fine, is quite obvious about what it's doing, and seems ?more in line with the Perlian Way. I don't believe I've ever seen anyone ?do that before. LISP allows you to search an alist either way. This is obviously better than two separate structures. And I believe APL allows you to do the equivalent of "@a[1,3,5] = (1,9,25)". However, APL doesn't have associative arrays. -- [rbj@uunet 1] stty sane unknown mode: sane