Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!elroy.jpl.nasa.gov!jpl-devvax!lwall
From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Newsgroups: comp.lang.perl
Subject: Re: perl memory usage?
Message-ID: <7556@jpl-devvax.JPL.NASA.GOV>
Date: 26 Mar 90 22:19:37 GMT
References: <1990Mar19.210743.15896@chinet.chi.il.us> <7480@jpl-devvax.JPL.NASA.GOV> <1990Mar26.174959.20102@chinet.chi.il.us>
Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Organization: Jet Propulsion Laboratory, Pasadena, CA
Lines: 61

In article <1990Mar26.174959.20102@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
: Perhaps a funky regexp would work better anyway, but I couldn't come
: up with one.  I'm trying to merge items like:
: 
: identifer (used for key in associative array)
: text (multi-line)
: SUMMARY:
: summary-text (multi-line)
: STATUS:
: text ...
: 
: If I find an updated item without the SUMMARY: entry, I want to grab the
: summary-text from the old entry and insert it into the new above the
: STATUS line.  My first attempt at pattern-matching with bracketed substrings
: failed on these multi-line strings, so I switched to the $` and $' and
: some tmp variables.  Is there a better way?  Note that I don't know which
: (if either) entry contains the SUMMARY: or that an old entry even exists,
: so the ability to test the success of the individual matches is handy.

I'd probably write this as

	$* = 1;
	if ($new !~ /\nSUMMARY:\n/) {
	    if (($was) = ($old =~ /^SUMMARY:\n([^\0]*)^STATUS:/)) {
		substr($new,index($new,"STATUS:\n"),0) = $was;
	    }
	}

or some such.  The thing to remember is that . doesn't match newline,
so use [^\0] to match newlines too.  (On older patchlevels you may have
to say \000 instead.)

Depending on the sizes of the relative text sections, it might be faster
to do it all with index, since [^\0]* has to match all the way to the
end and then back off.

	if (index($new, "\nSUMMARY:\n") < $[) {
	    $beg = index($old, "\nSUMMARY:\n");
	    if ($beg >= $[) {
		$end = index($old, "\nSTATUS:\n");
		substr($new,index($new,"STATUS:\n"),0) = 
		    substr($old,$beg + 10, $end - $beg - 9);;
	    }
	}

If there are more headers than that, it often becomes worthwhile to
take a parsing pass on it and put the entries into separate variables
or entries in an associative array.  Then you end up with wonderful
statements like

	$new{'SUMMARY'} = $old{'SUMMARY'} unless $new{'SUMMARY'};

A funky split like

	@new = split(/(^[A-Z]+):\n/,$new);
	unshift(@new,"FRONTSTUFF");
	%new = @new;		# alternating keys and values

comes to mind.  But that's probably not worthwhile for your thing.

Larry