Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!jpl-devvax!lwall From: lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) Newsgroups: comp.editors Subject: Re: Multiple line regexps Keywords: regexp, multiple lines Message-ID: <1991Jun6.193933.20110@jpl-devvax.jpl.nasa.gov> Date: 6 Jun 91 19:39:33 GMT References: <1991Jun2.231351.10229@trl.oz.au> <1991Jun03.045458.4972@convex.com> <2732@root44.co.uk> Reply-To: lwall@netlabs.com (Larry Wall) Distribution: comp Organization: Jet Propulsion Laboratory, Pasadena, CA Lines: 65 In article <2732@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes: : Distribution: comp : Organization: UniSoft Ltd., London, England : Lines: 18 : : tchrist@convex.COM (Tom Christiansen) writes: : : > perl -00 -ne 'print /(.*\n)/' some_file : : >The -00 put us in paragraph mode, and the (.*\n) isolates the first : >line of each paragraph for printing. : : The exact equivalent in awk is: : : awk 'BEGIN { RS=""; FS="\n" } : { print $1 }' some_file : : The RS="" makes blank lines the record separator, and the FS="\n" allows : the first line of the record to be obtained using "$1". That's an exact equivalent except in one Important Respect: $ perl -00 -ne 'print /(.*\n)/' u.usa.va.3 # u.usa.va.3 uucp-map@acsu.buffalo.edu #N ukelele #N un1 #N usancon #N usaos #N .uu.net, uunet #N .uucom.com, uucom #N vast #N .verdix.com, vrdxhq #N viar #N virgil #N .virginia.edu, virginia #N virtech #N visenix #N visix #N viusys #N vssadm #N vtserf #N wimpy #N wperkins #N .wsrcc.com, wsrcc.com, wsrcc #N wyvern #N xlisa #N xrxedds #N yendor #END u.usa.va.3 $ awk 'BEGIN { RS=""; FS="\n" }{ print $1 }' u.usa.va.3 # u.usa.va.3 uucp-map@acsu.buffalo.edu #N ukelele #N un1 #N usancon #N usaos Segmentation fault (core dumped) That's on a Vax. On Suns, at least it's polite enough to give an error message about the line being too long. Arbitrary limits are for the birds. They crap on you when you're already halfway to the celebration. Larry Wall lwall@netlabs.com