Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!apple!snorkelwacker!bloom-beacon!athena.mit.edu!jik From: jik@athena.mit.edu (Jonathan I. Kamens) Newsgroups: comp.unix.questions Subject: Re: Some questions Keywords: death of child, file descriptors, bind Message-ID: <1990Jun28.235729.26823@athena.mit.edu> Date: 28 Jun 90 23:57:29 GMT References: <1716@jura.tcom.stc.co.uk> Sender: news@athena.mit.edu (News system) Reply-To: jik@athena.mit.edu (Jonathan I. Kamens) Organization: Massachusetts Institute of Technology Lines: 113 In article <1716@jura.tcom.stc.co.uk>, ct@tcom.stc.co.uk (Clive Thomson) writes: |> 1) The document for the dup call says that it will return the lowest numbered |> file descriptor not used by the process. With the exception of one line |> in "The design and implementation of the BSD 4.3 UNIX operating system" |> (Leffler et al) I have seen no documentation to say open, creat and socket |> will do the same. Observation of open seems to suggest that the lowest fd |> is used, but I would like to be sure. All file descriptor allocation in the kernel works on the "use the lowest available fd" system. |> 2) When I am doing socket programming (ULTRIX 3.0 and SunOS4), and I do a |> bind, if the program terminates abnormally, I find that when I re-run the |> program the bind will fail with an "in use" error. Is there any way to |> convince the system that it is no longer "in use" (assuming of course |> uid, gid etc are the same). I've noticed that the Kernel sometimes gets confused about the state of a socket which isn't being used anymore; a program exiting abnormally is one way to cause this to sometimes occur (although it doesn't always occur). What ends up happening is that socket stays around in CLOSE_WAIT status so that no new connections can be made to it. Occasionally, the CLOSE_WAIT eventually goes away and it's once again possible to connect to the socket. However, if you don't want to wait and see if that'll happen, and you don't want to have to reboot the system in order to get the socket to go away, there is a way to force the ability to connect to the socket. What you need to do (at least in BSD; I don't know what happens with things like this in SysV) is to use the setsockopt() call to set the SO_REUSEADDR option on your new socket, before you attempt to connect to the socket which is busy. Keep in mind that this option works for all socket connections, not just the ones that in CLOSE_WAIT, so if another program really is using the socket and you try to connect to it again with SO_REUSEADDR set, you'll connect to it and the other program could very well lose. |> 3) I am a little confused by the "death of child signal". Is the following |> correct. If the parent ignores this signal, the kernel will release |> entries for zombie processes automatically. If the parent uses the default |> handler, it must wait() for the death of each child, or the child will |> become a zombie. If the parent invokes its own handler, in this handler |> a wait should be invoked, otherwise the child will become a zombie. If |> the parent dies before the children, all children are adopted by the init |> process, and the programmer need no longer worry about zombie processes. Unfortunately, it's impossible to generalize how the death of child processes should behave, because the exact mechanism varies over the various flavors of Unix. Perhaps someone who's "in the know" (or at least more so than I am) about POSIX can tell us what the POSIX standard behavior (if there is any) for this is. First of all, by default, you have to do a wait() for child processes under ALL flavors of Unix. That is, there is no flavor of Unix that I know of that will automatically flush child processes that exit, even if you don't do anything to tell it to do so. Second, allegedly, under some SysV-derived systems, if you do "signal(SIGCHLD, SIG_IGN)", then child processes will be cleaned up automatically, with no further effort in your part. However, people have told me that they've never seen this actually work; the best way to find out if it works at your site is to try it, although if you are trying to write portable code, it's a bad idea to rely on this in any case. If you can't use SIG_IGN to force automatic clean-up, then you've got to write a signal handler to do it. It isn't easy at all to write a signal handler that does things right on all flavors of Unix, because of the following inconsistencies: On some flavors of Unix, the SIGCHLD signal handler is called if one *or more* children have died. This means that if your signal handler only does one wait() call, then it won't clean up all of the children. Fortunately, I believe that all Unix flavors for which this is the case have available to the programmer the wait3() call, which allows the WNOHANG option to check whether or not there are any children waiting to be cleaned up. Therefore, on any system that has wait3(), your signal handler should call wait3() over and over again with the WNOHANG option until there are no children left to clean up. On SysV-derived systems, SIGCHLD signals are regenerated if there are child processes still waiting to be cleaned up after you exit the SIGCHLD signal handler. Therefore, it's safe on most SysV systems to assume when the signal handler gets called that you only have to clean up one signal, and assume that the handler will get called again if there are more to clean up after it exits. On older systems, signal handlers are automatically reset to SIG_DFL when the signal handler gets called. On such systems, you have to put "signal(SIGCHILD, catcher_func)" (where "catcher_func" is the name of the handler function) as the first thing in the signal handler, so that it gets reset. Unfortunately, there is a race condition which may cause you to get a SIGCHLD signal and have it ignored between the time your handler gets called and the time you reset the signal. Fortunately, newer implementations of signal() don't reset the handler to SIG_DFL when the handler function is called. The summary of all this is that on systems that have wait3(), you should use that and your signal handler should loop, and on systems that don't, you should have one call to wait() per invocation of the signal handler. Also, if you want to be 100% safe, the first thing your handler should do is reset the handler for SIGCHLD, even though it isn't necessary to do this on most systems nowadays. Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8495 Home: 617-782-0710