Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!convex!usenet From: tchrist@convex.COM (Tom Christiansen) Newsgroups: comp.editors Subject: Re: Multiple line regexps Keywords: regexp, multiple lines Message-ID: <1991Jun03.000600.21967@convex.com> Date: 3 Jun 91 00:06:00 GMT References: <1991Jun2.231351.10229@trl.oz.au> Sender: usenet@convex.com (news access account) Reply-To: tchrist@convex.COM (Tom Christiansen) Distribution: comp Organization: CONVEX Software Development, Richardson, TX Lines: 30 Nntp-Posting-Host: pixel.convex.com From the keyboard of soh@andromeda.trl.OZ.AU (kam hung soh): :I would like to write a regular expression which can look for patterns :longer than one line. For example, I want to find the first line of :each paragraph. If I try this regexp in grep or awk, /^$^.+$/, nothing :happens. Admittedly, I could replace newlines with a unique character :say '~', before I process my file, but I wondered if regexps can be used :across a newline boundary. Not in most text processing languages, but I'll offer two alternatives. Rob Pike once explained to me that his screen editor, sam, could handle such things because it doesn't have hard-wired in what a line is. Sadly, sam is not available for free (although it's probably cheap from the AT&T toolbox) so I've not used it. Perhaps someone who has might comment. Another possibility is to use perl, which isn't really an interactive editor, but is certainly a superset of sed, awk, and sh, at the very least. Perl has no problems with regexps spanning multiple lines. While the default records processed are a line at a time, you can switch this to paragraph mode (records delimited by newline pairs) or even whole-file mode, in which you can slurp the entire file into the pattern space. There's no problem saying something like s/\n\nX\n//g then. In fact, there's an internal variable you can set to change the definitions of ^ and $ to mean not just at the beginning or end of a string, but rather anywhere after and before a newline as well, which is often handy. --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "Perl is to sed as C is to assembly language." -me