Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!ucla-cs!lanai.cs.ucla.edu!gast From: gast@lanai.cs.ucla.edu (David Gast) Newsgroups: comp.editors Subject: Re: Multiple line regexps Keywords: regexp, multiple lines Message-ID: <1991Jun3.013205.412@cs.ucla.edu> Date: 3 Jun 91 01:32:05 GMT References: <1991Jun2.231351.10229@trl.oz.au> Sender: usenet@cs.ucla.edu (Mr. News Himself) Distribution: comp Organization: UCLA Computer Science Department Lines: 43 Nntp-Posting-Host: lanai.cs.ucla.edu In article <1991Jun2.231351.10229@trl.oz.au> soh@andromeda.trl.OZ.AU (kam hung soh) writes: >I would like to write a regular expression which can look for patterns >longer than one line. For example, I want to find the first line of >each paragraph. If I try this regexp in grep or awk, /^$^.+$/, nothing >happens. Most unix commands are line oriented the quick answer is no, but ... Sed allows patterns to be longer than one line. With a little bit of programming, you can have awk recognize patterns across lines, just save the line into a variable and then test to see whether the old line matches a pattern and the new line does. I realize this statement is not very clear, let me give a concrete example. (I have not checked this code, so it may have a typo or two, but the example should be clear). Suppose you want to define a new paragraph as occurring when the previous line is null (you may want to make it null or only white space since people do put spaces or tabs on otherwise null lines or at the end of lines) and the current line is non-null (you could have indented five spaces, begins with a capital, etc). This program prints the first line of every new paragraph, you can revise it to suit your needs. awk ' $0 ~ /./ && oldline ~ /^$/ { print $0 } {oldline=$0} ' arguments-go-here Note: If the first line of the file has text on it, it will print it since oldline is implicitly initialized to null. Obviously, perl could also do this since perl can do everything. :-) David Gast Admittedly, I could replace newlines with a unique character >say '~', before I process my file, but I wondered if regexps can be used >across a newline boundary.