Xref: utzoo comp.bugs.sys5:1176 comp.bugs.4bsd:1454 Path: utzoo!utgpu!watmath!att!dptg!ulysses!andante!alice!dmr From: dmr@alice.UUCP Newsgroups: comp.bugs.sys5,comp.bugs.4bsd Subject: Re: s/A*./B/g editor bug? Message-ID: <10036@alice.UUCP> Date: 20 Oct 89 03:39:35 GMT Organization: AT&T Bell Laboratories, Murray Hill NJ Lines: 55 The behavior of ed's s/b*a/c/g on aaaaa to yield cacacacacac is doubtless a bug, and quite old; it occurs in the Seventh Edition ed. It relates to handling of null strings. The problem ed is trying to overcome is shown by the command s/b*/c/g on aaaaa where the most desirable result is presumably cacacacacac where each null string before, inside, and after the subject matches /b*/ exactly once. But "each null string" is a bit slippery in practice. Ed generally handles the 'g' option of substitution by iterating the 's' command, with an advancing starting position, until it reaches the end of the subject string. The new starting position is just between the end of the replacement string from the last iteration and the end of the unconsumed part of the subject string (if the r.e. match succeeded), or one character further along in the subject (if the match at this position failed). When the algorithm suggested by this scheme is tried naively, the second example, 's/b*/c/g', blows up; it puts a 'c' at the beginning, then again finds the null string after the inserted 'c' and before the same, initial 'a'; the r.e. didn't consume any of the subject. The remedy for this misbehavior that ed chose was to force * expressions to fail to match empty strings at the beginning of the subject string on the second and subsequent iterations of the 's/.../.../g' processing. (If you have the code, this is the meaning of the 'locs' variable in advance().) So in the 's/b*a/c/g' example, a 'c' is put at the start; on the second iteration, the subject still begins with the first 'a', and the 'b*' fails on this initial 'a'. Then it advances; this time the substitution succeeds, because it is looking for 'b*a' not at the beginning. What you've discovered is that this remedy is also naive. It looks wiser to attack the problem head-on, by permitting the r.e. null string match, but recasting the 's/.../.../g' loop so that if the entire r.e. matches a null string immediately to the right of a string replaced in the preceding iteration, an advance through the subject string is forced. Dennis Ritchie dmr@research.att.com att!research!dmr