Path: utzoo!attcan!uunet!cs.utexas.edu!sdd.hp.com!hplabs!hpcc01!hpcuhb!hpda!hpcupt1!jonb From: jonb@hpcupt1.HP.COM (Jon Bayh) Newsgroups: comp.sys.hp Subject: Re: HP-UX 7.0 problems with ps(1) and awk(1) in pipe Message-ID: <-286539949@hpcupt1.HP.COM> Date: 12 Aug 90 21:39:15 GMT References: <90222.183205QQ11@LIVERPOOL.AC.UK> Organization: Hewlett Packard, Cupertino Lines: 51 Alan Thew of the University of Liverpool Computer Laboratory writes: > The following line of code is used to > find the old PID. > > set exists=`ps -uqq11 | awk '$4 == "getus.csh" && $1 != x { print $1 }' x=$$ -` > > This looks for the PIDs of the program and finds the older one if it > exists (if it is found kill(1) is used). > > This worked fine at HP-UX 3.1 but at 7.0 a "spurious" PID is found > (presumably as a result of executing part of the above pipe). The > resulting kill(1) fails since the process is already dead. Mr. Thew, I'm afraid that you are running into a race condition in the pipe code above, as you surmised. The csh is spawning off two sub-processes, one that will become the ps process, and one that will become the awk process. Between the time of the fork() and the time that the forked csh performs the exec(), however, the process inherits the name of the original process, "getus.csh". The race condition occurs when the ps process is exec'ed first, and begins to run before the awk process has had a chance to perform its exec. During this window, the script will find three processes that are named "getus.csh"---the old one that you want to kill, the parent of the above script (which will be discarded by the $1 != x case above), and the forked csh process that will eventually become the awk process. This race condition also existed in 3.1, but the 7.0 'ps' is much, much faster than the 3.1 'ps', and that probably causes the race to show up more. One simple (but rather kludgy) fix to the problem is to change the "$1 != x" comparison above to something like "($1 < x || $1 > x+5)". That takes advantage of the fact that the shell script and its ps and awk subprocesses will probably be spawned off quickly and with sequential PIDs. It may not work if the system is busy spawning processes, if the system has had some long time processes around that happen to match the PIDs that are being allocated, if the older getus.csh happens to match the PIDs being allocated, or if a future system does not allocate PIDs sequentially. Another, better, solution would be to make the existence test a separate script with a different name, perhaps "findgetus.csh". That way, its name will not conflict with the name of the parent script. Since the parent of the ps and awk processes would have the name "findgetus.csh", they wouldn't match the target string and the race condition wouldn't matter. The parent could pass its PID as a parameter to the find script so that the find script would avoid the parent shell process. Jon Bayh jonb@hpda