Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!mcsun!ukc!slxsys!ibmpcug!demon!news From: pmoore@cix.compulink.co.uk (Paul Moore) Newsgroups: comp.lang.perl Subject: Counting RE occurrences Message-ID: <1991May13.184504.13844@demon.co.uk> Date: 13 May 91 18:45:04 GMT Sender: news@demon.co.uk (C-News Owner) Reply-To: Paul Moore Organization: Gated to News by demon.co.uk Lines: 72 This is one of those problems which I am convinced ought to have a simple (probably one-line) solution in perl, but I sure can't find it... I have a string, which contains a piece of text. I also have a regular expression. I want to count the number of times the RE appears in the string. I am aware that obnoxious REs, such as ones which match the empty string, and ones which overlap themselves, can make even *defining* the idea of "the number of times this RE appears in this string" difficult, but for straightforward cases the intention is clear. As an example (this is the task which first made me want to do this), I have a file, which has been copied from an MS-DOS box to my (non-MS-DOS) machine. So the lines in the file are delimited by "\r\n", and not just "\n". I have slurped the file into a string, in order to do some processing, and I need to count the number of lines. So what I want to do is count the number of occurrences of the string "\r\n" in the string. IE, open(DOS,"Ms-dos-file"); undef $/; $str = ; # Slurp .... processing on $str ... $lines = &count($str, "\r\n"); # Somehow... .... more processing ... The only way I can see, which works for a general RE, is $count = ($str =~ s/RE/$&/g); but the idea of doing global substitution, and using $&, strikes me as a bit inefficient... Another example, which shows why a general RE is better than just a string, is if I am trying to write a wc clone. So we have open(FILE, $ARGV[1]); undef $/; $str = ; $chars = length($str); # Don't worry about funny line terminators this time, and note # that we can use the return value of tr/// for single character # counts... $lines = ($str =~ tr/\n//); It seems to me that a nice way of counting words would be to count the occurrences of the pattern /\b/, and divide by 2. With perl's blindingly efficient pattern matching, this may be a very fast method. Obviously, in most individual cases, there are alternative ways of doing what I want. However, counting REs strikes me as a very "perl-ish" sort of activity, and I would have expected it to be built in, somehow. Perhaps as the return value of m// (which specifically isn't the case). Comments, anyone? Gustav. PS Sorry if this has already appeared, but I don't think it made it out of my system... E-Mail: pmoore%cix@ukc.ac.uk or: gustav@tharr.UUCP