Newsgroups: comp.editors Path: utzoo!sq!lee From: lee@sq.sq.com (Liam R. E. Quin) Subject: Re: Multiple line regexps Message-ID: <1991Jun4.215913.25633@sq.sq.com> Keywords: regexp, multiple lines Organization: SoftQuad Inc., Toronto, Canada X-Feet: bare References: <1991Jun2.231351.10229@trl.oz.au> Distribution: comp Date: Tue, 4 Jun 91 21:59:13 GMT Lines: 67 soh@andromeda.trl.OZ.AU (kam hung soh) writes: >I would like to write a regular expression which can look for patterns >longer than one line. For example, I want to find the first line of >each paragraph. If I try this regexp in grep or awk, /^$^.+$/, nothing >happens. Although you can't match across a newline with /^$^.+$/ in most Unix software, you can get what you want. You _could_ do it in lex, by the way, and that would be sensible if you were going to do the same thing often. You can do this in sed or awk, and also in ex or vi, with a little cleverness. Here's how in ex or vi.... First, we could print all blank (empty) lines with :g/^$/p The command g reg-exp command tells the editor (vi, ex, ed) to run the command on every line that matches the pattern. The command is pretty unrestricted, although it can't be another global (g) command... Well, that prints all the blank lines. We could print all lines after a blank line: :g/^$/+1p but that isn't quite right, because it goes wrong if there are two blank lines in a row. Ah! that's why you had /^$.+$/ and not /^$.*$/. I see... OK, we could do this: :g/^$/+1s/./&/p This says that on the line after each blank line, try to substitute a single character for itself (&), and if that worked print the line. This is OK except that if the last line in the file is blank the +1 is wrong, so we must omit the last line, and do the command on 1,$-1: :1,$-1g/^$/.+1s/./&/p Wow! well, that's plausible. In sed, we could use the Hold space. I won't do that here, as it's a little confusing to describe... In awk, though, we could do this: awk ' /^./ { if (last == "") print } { last = $0 }' You can be terser with some versions of awk: awk '/^./{ if (last == "") print} { last = $0 }' If you have mgrep of Gnu grep, you could also grep for blank lines, with one line of context, and grep for . on the result. So none of these answer your real, fundamental, can-regexp-do-this question, but they do address what you're trying to solve. Lex can do multi-line patterns, and in Dougherty & O'Reilly's Unix Text Processing (the big blue one) there is an example of a multi-line grep using sed, as I recall. Liam -- Liam Quin, lee@sq.com, SoftQuad, Toronto, +1 416 963 8337 the barefoot programmer