Path: utzoo!attcan!uunet!mcvax!hp4nl!philmds!leo
From: leo@philmds.UUCP (Leo de Wit)
Newsgroups: comp.unix.wizards
Subject: Re: This is strange...
Summary: Not so strange
Keywords: sed awk pipe
Message-ID: <900@philmds.UUCP>
Date: 23 Dec 88 11:38:43 GMT
References: <1652@ektools.UUCP>
Reply-To: leo@philmds.UUCP (Leo de Wit)
Organization: Philips I&E DTS Eindhoven
Lines: 111

In article <1652@ektools.UUCP> mcapron@ektools.UUCP (M. Capron) writes:
|
|Here is some bizareness I found.  Below is a subset of a Bourne Shell script I
|am writing on a Sun 3/60 running SunOS 4.0.  This segment generates dependency
|lists for makefiles.  Note that the egrep brackets should contain a space and
|a tab.
|
|#!/bin/sh
|for i in *.c
|do
|#Place a list of include files in $incs seperated by spaces.
|#CODE A or CODE B goes here.
|	echo "$i : $incs"
|done
|
|CODE A: This works.
|incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}'`
|incs=`echo "$incs" | sed 's/"//g'`
|
|CODE B: This does not work.
|incs=`egrep '^#[         ]*include[     ]*"' $i | awk '{printf "%s ", $2}' | sed 's/"//g'`
|
|With CODE B, $incs comes out to be nil.  I can't figure out what the difference
|is, nor do I have the patience to play with it any furthing.  I present it as an
|oddity to any interested parties. 

There certainly is a difference (although it may not be very obvious).
The awk script does not append a newline to the header file list it is
generating. In the case of CODE A that is not a problem: echo will send
one down the pipe to sed. In the case of CODE B sed is attached
directly to awk's output, so it will never get a newline. And since sed
needs a newline as 'input record marker' , it will exit without having
recognized a valid input record - and hence not supply any output.

The solution is simple: add a trailing print statement to the awk script,
as follows:
CODE C: This does also work.
incs=`egrep '^#[ 	]*include[ 	]*"' $i |
      awk '{printf "%s ", $2} END {print}' | sed 's/"//g'`

Furthermore I would like to make some remarks about the script; maybe they
are of some use to someone.

1) The use of a 3 process pipeline for such a simple task seems a
little bit overdone; it all lays well within the capabilities of one,
e.g. with sed:

CODE D: This does also work.
incs=`sed -n '
/^[ 	]*#[ 	]*include[ 	]*"/{
    s/[^"]*"\([^"]*\)".*/\1/
    H
}
${
    g
    s/\n/ /gp
}' $i`

It is even possible to avoid the echo, the `` and incs, since sed can
handle that as well:

CODE E: This does also work (omit the echo in this case).
sed -n '
/^[ 	]*#[ 	]*include[ 	]*"/{
    s/[^"]*"\([^"]*\)".*/\1/
    H
}
${
    g
    s/\n/ /g
    s/^/'$i' : /p
}' $i

The other points are more of a C issue, but I will present them here
since the script was also:

2) When searching for '#include' lines one should allow leading white space.
There is nothing that I could find that forbids white space before the #.
Some programmers even use it to clearify nested conditionals (with #ifdef).
The CODE D,E examples allow leading white space.

3) Source files are not dependent of the header files they name. This
is a commonly made mistake. To understand this, you must realize that
the source file will not change due to a modification in a header file.
The object file however will, since code is generated from the expanded
source file (the output of the preprocessor phase).
So the dependencies should contain lines like:

    file.o : incl.h   (or perhaps: file.o : file.c incl.h)

instead of

    file.c : incl.h

The easiest way is to strip off the .c, and use the filename without
extension:

for i in `echo *.c|sed 's/\.c//g'`
do
#CODE X goes here, using file $i.c
	echo "$i.o : $incs"
done

4) Be aware that the script does not handle header files containing
header files.  Note that an object (amongst others) depends upon all
(nested) included files.  To handle this well, you may perhaps also
want to detect illegal recursion; this is not easy in case of
conditional inclusion, since it depends on preprocessor expressions.

Hope this helps -
                    Leo.