Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!husc6!rice!sun-spots-request From: pixar!rta@ucbvax.berkeley.edu (Rick Ace) Newsgroups: comp.sys.sun Subject: Re: Reaping zombie processes Keywords: SunOS Message-ID: <3335@pixar.UUCP> Date: 30 Mar 89 23:25:45 GMT References: <8902071950.AA18464@helios> <412@odin.cs.hw.ac.uk> <9084@elsie.UUCP> Sender: usenet@rice.edu Organization: Pixar -- Marin County, California Lines: 56 Approved: Sun-Spots@rice.edu Original-Date: 16 Mar 89 19:01:17 GMT X-Sun-Spots-Digest: Volume 7, Issue 217, message 3 of 11 Here's the lowdown on exiting and zombie processes, circa SunOS 3.5. It may or may not be different under 4.0. Process exit begins when 1) the process exits voluntarily via the "exit" syscall, or 2) when it is forced to do so by an uncaught signal. The kernel enters a routine called exit() [those of you with source can sing along, the rest just have to believe me :-]. Upon entering exit() (the kernel's exit(), that is), the kernel sets the SWEXIT flag in the struct proc of the process. This flag advises the paging and swapping logic that the process is on its way out and should be held in core so its demise will be quick. The next step taken is to release the user virtual memory occupied by the process. This encompasses the text, data, and stack segments, but not the kernel's "u. area" for the process (yet). Now the kernel runs through all open file descriptors, closing each one. This can result in calls to the "close" routines within device drivers. The drivers are at liberty to suspend the process if they so choose (for example, a tty driver may suspend the process until all characters in the output queue have been delivered to the hardware). Each driver is unique in its behavior, so the reasons for suspending a process will vary. One would hope that the programmer who coded the driver would implement a timeout, which would give up and resume the user process after a reasonable amount of time, but unfortunately this is more the exception than the rule. If a device driver should choose to suspend the process, "ps" will report the process as "exiting". In this case, the WHCAN column of the "ps" display will in an obscure way reflect the event the device driver is awaiting to wake the process from its sleep. When "ps" reports a process as "exiting", the process is most likely delayed in the close-the-file-descriptors phase of exiting. After all of the file descriptors are closed, the kernel then discards the page tables and "u. area" of the process, and places the process in the "zombie" state, which is signified by the value SZOMB in the p_stat field of the proc structure. At this point, the proc structure is the only vestige of the process remaining on the system (it's pretty minimal, see /usr/include/sys/proc.h), and its purpose it to maintain process exit status and accounting information for the parent. A process in this state will appear as a "zombie" in the "ps" display. When the parent reaps the process using wait(), wait3(), or whatever else is fashionable these days, the proc struct is discarded and the process is completely gone. Regarding "gcore": Since the VM of the process is discarded very shortly after the kernel sets the SWEXIT flag, when "gcore" sees SWEXIT, it concludes that the process has no VM to dump, so it tells you that the process is exiting and gives up. It cannot dump memory because there is no memory left to dump. Rick Ace Pixar 3240 Kerner Blvd, San Rafael CA 94901 ...!{sun,ucbvax}!pixar!rta [[ Thank you very much! That was most informative. --wnl ]]