Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site umcp-cs.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!panda!talcott!harvard!caip!lll-crg!gymble!umcp-cs!chris From: chris@umcp-cs.UUCP (Chris Torek) Newsgroups: net.unix-wizards,net.unix Subject: Re: UNIX question Message-ID: <2548@umcp-cs.UUCP> Date: Fri, 13-Dec-85 17:06:27 EST Article-I.D.: umcp-cs.2548 Posted: Fri Dec 13 17:06:27 1985 Date-Received: Sun, 15-Dec-85 00:18:30 EST References: <156@uw-june> <974@ccice5.UUCP> Organization: U of Maryland, Computer Science Dept., College Park, MD Lines: 70 Keywords: zombies Xref: watmath net.unix-wizards:16106 net.unix:6630 In article <974@ccice5.UUCP> ahb@ccice5.UUCP (Al Brumm) writes: > A clean way to [ignore children] on Sys3 was to use the following > system call in the parent process: > signal(SIGCLD, SIG_IGN); Cute... maybe I will add this hack to our kernel. One question: Is SIGCLD always reset to SIG_DFL on exec? If not, since ignored signals normally remain ignored, it could break other programs which expect to collect children; and programs that ignore SIGCLD would have to carefully un-ignore it just after forks. > Note that this would not allow you to examine the child's exit > status. However, you could examine the exit status by doing the > following: > int > sigcld() > { > int pid, status; > pid = wait(&status); > ... > } > main() > { > int (*sigcld)(); > > signal(SIGCLD, sigcld); > } Well, the `int (*sigcld)()' declaration is wrong and (in this case) unnecessary; it should be `int sigcld()' if anything. But that is not all that is amiss. In V7, 3BSD, and 4BSD, and I suspect also in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well, signals are not queued, and without the `jobs library' of 4.1BSD, or the signal facilities of 4.2, this code cannot be made to operate reliably. It *will fail*, someday, no doubt at the worst possible moment. The problem is that several children may exit in quick succession. Only one SIGCLD signal will be delivered, since the parent process will (just this once) not manage to run before all have exited. The sigcld handler has no way of determining how many children are to be processed. In 4.1BSD and later, the solution is a new `system call', wait3(). This call has two optional parameters, WNOHANG and WUNTRACTED. WNOHANG tells the kernel not to wait for existing children to exit. Instead, wait3 returns 0 in this case, allowing the signal handler to finish up, having now collected all exited children. (WUNTRACED exists only for C-shell style job control with stopped processes, and is irrelevant here.) Unfortunately, this solution is still incomplete. There are race conditions unless the child exit signal is withheld (but not ignored) for the duration of the child collection routine, and can be withheld during process creation (in case the created process exits before the parent finishes updating data structures). This is the case under the 4.1BSD `jobs' library, and in all 4.2 and 4.3 systems. Anyway, what it all boils down to is that process control is unreliable in many versions of Unix, but can be made reliable in 4.1, 4.2, and 4.3BSD. If there is any way to reliably handle process exit and `job control' style processing in System III and System V, I am not aware of it---though that should be unsurprising since I have never used them. If it is possible in the latest AT&T Unixes, I would like to know how. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu