Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!ucla-cs!lanai.cs.ucla.edu!gast
From: gast@lanai.cs.ucla.edu (David Gast)
Newsgroups: comp.editors
Subject: Re: Multiple line regexps
Keywords: regexp, multiple lines
Message-ID: <1991Jun3.013205.412@cs.ucla.edu>
Date: 3 Jun 91 01:32:05 GMT
References: <1991Jun2.231351.10229@trl.oz.au>
Sender: usenet@cs.ucla.edu (Mr. News Himself)
Distribution: comp
Organization: UCLA Computer Science Department
Lines: 43
Nntp-Posting-Host: lanai.cs.ucla.edu

In article <1991Jun2.231351.10229@trl.oz.au> soh@andromeda.trl.OZ.AU (kam hung soh) writes:
>I would like to write a regular expression which can look for patterns
>longer than one line.  For example, I want to find the first line of
>each paragraph.  If I try this regexp in grep or awk, /^$^.+$/, nothing
>happens.

Most unix commands are line oriented the quick answer is no, but ...
Sed allows patterns to be longer than one line.  With a little bit
of programming, you can have awk recognize patterns across lines,
just save the line into a variable and then test to see whether the
old line matches a pattern and the new line does.  I realize this
statement is not very clear, let me give a concrete example.  (I
have not checked this code, so it may have a typo or two, but the
example should be clear).

Suppose you want to define a new paragraph as occurring when the previous
line is null (you may want to make it null or only white space since people
do put spaces or tabs on otherwise null lines or at the end of lines) and
the current line is non-null (you could have indented five spaces, begins
with a capital, etc).  This program prints the first line of every new
paragraph, you can revise it to suit your needs.

awk '
	$0 ~ /./ && oldline ~ /^$/ {
		print $0 }
		{oldline=$0}
	' arguments-go-here

Note: If the first line of the file has text on it, it will print it
since oldline is implicitly initialized to null.

Obviously, perl could also do this since perl can do everything.  :-)

David Gast


Admittedly, I could replace newlines with a unique character
>say '~', before I process my file, but I wondered if regexps can be used
>across a newline boundary.