Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: Notesfiles; site hpfcls.UUCP Path: utzoo!watmath!clyde!cbosgd!ukma!psuvm.bitnet!psuvax1!burdvax!sdcrdcf!hplabs!hpfcdc!hpfcls!hpfcla!rml From: rml@hpfcla.UUCP Newsgroups: net.unix Subject: Re: UNIX question Message-ID: <12600002@hpfcls.UUCP> Date: Wed, 11-Dec-85 18:49:00 EST Article-I.D.: hpfcls.12600002 Posted: Wed Dec 11 18:49:00 1985 Date-Received: Mon, 23-Dec-85 00:18:32 EST References: <156@uw-june.UUCP> Organization: 11 Dec 85 16:49:00 MST Lines: 81 > > My question: Is there any way to kill off these zombies so I can get > > more processes ? Or, failing that, is there any other > > way to do what I want ? > > ... > > Or, you could keep track of the child PIDs and probe > their state every so often via kill() with "signal" 0, > waiting on those that return failure from the kill(). This will work on 4.x-based systems, but not on most others. Kill does not support "signal" 0 in many earlier systems. In System III and V, kill does support "signal" 0, but does not fail on attempts to send signals to zombies. > A clean way to handle this problem on Sys3 was to use the following > system call in the parent process: > signal(SIGCLD, SIG_IGN); > > Then when a child process exited, a zombie would not be created. This applies to System V as well. It is not, however, part of the SVID. > Is SIGCLD always reset to SIG_DFL on exec? If not, since ignored > signals normally remain ignored, it could break other programs > which expect to collect children; and programs that ignore SIGCLD > would have to carefully un-ignore it just after forks. SIGCLD is not reset from SIG_IGN to SIG_DFL on exec. Yes, this means that programs which ignore it need to be careful before spawning other programs. The same is true, by the way, of programs which mask out signals in BSD systems. > In V7, 3BSD, and 4BSD, and I suspect also > in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well, > signals are not queued, and without the `jobs library' of 4.1BSD, > or the signal facilities of 4.2, this code cannot be made to operate > reliably. It *will fail*, someday, no doubt at the worst possible > moment. > > The problem is that several children may exit in quick succession. > Only one SIGCLD signal will be delivered, since the parent process > will (just this once) not manage to run before all have exited. > The sigcld handler has no way of determining how many children are > to be processed. It turns out that SIGCLD can be used reliably in System III and V. What is missing from the example is a call within the signal handler to re-install itself. > int > sigcld() > { > int pid, status; > pid = wait(&status); > ... >>> signal(SIGCLD, sigcld); /* add this line */ > } The signal(2) system call checks to see if any zombie child(ren) are present and sends the calling process another SIGCLD if there are. The signal handler is thus invoked recursively, once per zombie. Note that the reinstallation of the handler must follow the call to wait, or infinite recursion results. Unfortunately in System III SIGCLD was not reset-when-caught, so this call might have been left out, allowing children to be missed. This was changed in System V; SIGCLD is reset to SIG_DFL when caught. Note that there is no loss of reliability from the reset to SIG_DFL; since SIGCLD is ignored by default, this is equivalent to masking out the signal until the handler is reinstalled. Unfortunately both System III and V fail to document these semantics of signal(2), and instead have an incorrect explanation on the signal(2) page which states that SIGCLD signals are queued internally. We at HP implemented some systems (HP9000 series 500 releases <= 4.02) which queued the signals as AT&T documents; current HP systems are all compatible with the System V code. BTW, I find BSD's wait3 with WNOHANG to be a more intuitive mechanism. Bob Lenk {hplabs, ihnp4}!hpfcla!rml