Path: utzoo!attcan!uunet!mcvax!hp4nl!philmds!leo From: leo@philmds.UUCP (Leo de Wit) Newsgroups: comp.unix.wizards Subject: Re: This is strange... Summary: Not so strange Keywords: sed awk pipe Message-ID: <900@philmds.UUCP> Date: 23 Dec 88 11:38:43 GMT References: <1652@ektools.UUCP> Reply-To: leo@philmds.UUCP (Leo de Wit) Organization: Philips I&E DTS Eindhoven Lines: 111 In article <1652@ektools.UUCP> mcapron@ektools.UUCP (M. Capron) writes: | |Here is some bizareness I found. Below is a subset of a Bourne Shell script I |am writing on a Sun 3/60 running SunOS 4.0. This segment generates dependency |lists for makefiles. Note that the egrep brackets should contain a space and |a tab. | |#!/bin/sh |for i in *.c |do |#Place a list of include files in $incs seperated by spaces. |#CODE A or CODE B goes here. | echo "$i : $incs" |done | |CODE A: This works. |incs=`egrep '^#[ ]*include[ ]*"' $i | awk '{printf "%s ", $2}'` |incs=`echo "$incs" | sed 's/"//g'` | |CODE B: This does not work. |incs=`egrep '^#[ ]*include[ ]*"' $i | awk '{printf "%s ", $2}' | sed 's/"//g'` | |With CODE B, $incs comes out to be nil. I can't figure out what the difference |is, nor do I have the patience to play with it any furthing. I present it as an |oddity to any interested parties. There certainly is a difference (although it may not be very obvious). The awk script does not append a newline to the header file list it is generating. In the case of CODE A that is not a problem: echo will send one down the pipe to sed. In the case of CODE B sed is attached directly to awk's output, so it will never get a newline. And since sed needs a newline as 'input record marker' , it will exit without having recognized a valid input record - and hence not supply any output. The solution is simple: add a trailing print statement to the awk script, as follows: CODE C: This does also work. incs=`egrep '^#[ ]*include[ ]*"' $i | awk '{printf "%s ", $2} END {print}' | sed 's/"//g'` Furthermore I would like to make some remarks about the script; maybe they are of some use to someone. 1) The use of a 3 process pipeline for such a simple task seems a little bit overdone; it all lays well within the capabilities of one, e.g. with sed: CODE D: This does also work. incs=`sed -n ' /^[ ]*#[ ]*include[ ]*"/{ s/[^"]*"\([^"]*\)".*/\1/ H } ${ g s/\n/ /gp }' $i` It is even possible to avoid the echo, the `` and incs, since sed can handle that as well: CODE E: This does also work (omit the echo in this case). sed -n ' /^[ ]*#[ ]*include[ ]*"/{ s/[^"]*"\([^"]*\)".*/\1/ H } ${ g s/\n/ /g s/^/'$i' : /p }' $i The other points are more of a C issue, but I will present them here since the script was also: 2) When searching for '#include' lines one should allow leading white space. There is nothing that I could find that forbids white space before the #. Some programmers even use it to clearify nested conditionals (with #ifdef). The CODE D,E examples allow leading white space. 3) Source files are not dependent of the header files they name. This is a commonly made mistake. To understand this, you must realize that the source file will not change due to a modification in a header file. The object file however will, since code is generated from the expanded source file (the output of the preprocessor phase). So the dependencies should contain lines like: file.o : incl.h (or perhaps: file.o : file.c incl.h) instead of file.c : incl.h The easiest way is to strip off the .c, and use the filename without extension: for i in `echo *.c|sed 's/\.c//g'` do #CODE X goes here, using file $i.c echo "$i.o : $incs" done 4) Be aware that the script does not handle header files containing header files. Note that an object (amongst others) depends upon all (nested) included files. To handle this well, you may perhaps also want to detect illegal recursion; this is not easy in case of conditional inclusion, since it depends on preprocessor expressions. Hope this helps - Leo.