Path: utzoo!mnetor!uunet!husc6!rutgers!lll-lcc!ames!umd5!uvaarpa!mcnc!ece-csc!ncrcae!ncr-sd!hp-sdd!hplabs!hpda!hpcupt1!hpirs!wk
From: wk@hpirs.HP.COM (Wayne Krone)
Newsgroups: comp.bugs.sys5
Subject: Re: Bug in sed regexps ?
Message-ID: <3920004@hpirs.HP.COM>
Date: 28 Dec 87 23:39:27 GMT
References: <9578@santra.UUCP>
Organization: Hewlett Packard, Cupertino
Lines: 36

The behaviour noted is correct.  The apparent problem can be reduced to
the first line of the sed script:

    sed -e 's/^\([^.]*\)[^:]*:\([^	]*\)	\1/\2	\1/'

being processed against the third line of the input file:

    stdipc.3c:.TH STDIPC 3C "" "" HP-UX	ftok \- standard ...

which gives the result:

    .TH STDIPC 3C "" "" HP-UX	ftok \- standard ...

when what was wanted was no change to that line of input by that line
of the sed script.

The first line of the sed script was intended to operate on patterns such
as:

    <string1><.><junk><:><string2><tab><string1>

and so it was expected that line 3 of the input file would not be
processed because the obvious match for <string1> of "stdipc" did not
appear a second time in the input line after a <tab>.  However, based
upon the regular expression, the non-obvious match for <string1> is zero
characters (the NULL string) and <string1> as a zero length pattern does
match after the <tab>.  Stating the problem another way, while the
regular expression establishes that <string1> can not extend past the
first ".", it fails to prevent <junk> from matching characters before
the first ".".

The solution, as you have already noted, is to establish an explicit
boundary between the <string1> and <junk> expressions.

Wayne Krone
Hewlett-Packard