Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!apple!snorkelwacker!bloom-beacon!athena.mit.edu!jik
From: jik@athena.mit.edu (Jonathan I. Kamens)
Newsgroups: comp.unix.questions
Subject: Re: Some questions
Keywords: death of child, file descriptors, bind
Message-ID: <1990Jun28.235729.26823@athena.mit.edu>
Date: 28 Jun 90 23:57:29 GMT
References: <1716@jura.tcom.stc.co.uk>
Sender: news@athena.mit.edu (News system)
Reply-To: jik@athena.mit.edu (Jonathan I. Kamens)
Organization: Massachusetts Institute of Technology
Lines: 113

In article <1716@jura.tcom.stc.co.uk>, ct@tcom.stc.co.uk (Clive Thomson)
writes:
|> 1) The document for the dup call says that it will return the lowest
numbered
|>    file descriptor not used by the process. With the exception of one line
|>    in "The design and implementation of the BSD 4.3 UNIX operating system"
|>    (Leffler et al) I have seen no documentation to say open, creat
and socket
|>    will do the same. Observation of open seems to suggest that the lowest fd
|>    is used, but I would like to be sure.

  All file descriptor allocation in the kernel works on the "use the
lowest available fd" system.

|> 2) When I am doing socket programming (ULTRIX 3.0 and SunOS4), and I do a 
|>    bind, if the program terminates abnormally, I find that when I re-run the
|>    program the bind will fail with an "in use" error. Is there any way to
|>    convince the system that it is no longer "in use" (assuming of course 
|>    uid, gid etc are the same).

  I've noticed that the Kernel sometimes gets confused about the state
of a socket which isn't being used anymore; a program exiting abnormally
is one way to cause this to sometimes occur (although it doesn't always
occur).  What ends up happening is that socket stays around in
CLOSE_WAIT status so that no new connections can be made to it.

  Occasionally, the CLOSE_WAIT eventually goes away and it's once again
possible to connect to the socket.  However, if you don't want to wait
and see if that'll happen, and you don't want to have to reboot the
system in order to get the socket to go away, there is a way to force
the ability to connect to the socket.

  What you need to do (at least in BSD; I don't know what happens with
things like this in SysV) is to use the setsockopt() call to set the
SO_REUSEADDR option on your new socket, before you attempt to connect to
the socket which is busy.

  Keep in mind that this option works for all socket connections, not
just the ones that in CLOSE_WAIT, so if another program really is using
the socket and you try to connect to it again with SO_REUSEADDR set,
you'll connect to it and the other program could very well lose.

|> 3) I am a little confused by the "death of child signal". Is the following
|>    correct. If the parent ignores this signal, the kernel will release
|>    entries for zombie processes automatically. If the parent uses the
default
|>    handler, it must wait() for the death of each child, or the child will
|>    become a zombie. If the parent invokes its own handler, in this handler
|>    a wait should be invoked, otherwise the child will become a zombie. If
|>    the parent dies before the children, all children are adopted by the init
|>    process, and the programmer need no longer worry about zombie processes.

  Unfortunately, it's impossible to generalize how the death of child
processes should behave, because the exact mechanism varies over the
various flavors of Unix.  Perhaps someone who's "in the know" (or at
least more so than I am) about POSIX can tell us what the POSIX standard
behavior (if there is any) for this is.

  First of all, by default, you have to do a wait() for child processes
under ALL flavors of Unix.  That is, there is no flavor of Unix that I
know of that will automatically flush child processes that exit, even if
you don't do anything to tell it to do so.

  Second, allegedly, under some SysV-derived systems, if you do
"signal(SIGCHLD, SIG_IGN)", then child processes will be cleaned up
automatically, with no further effort in your part.  However, people
have told me that they've never seen this actually work; the best way to
find out if it works at your site is to try it, although if you are
trying to write portable code, it's a bad idea to rely on this in any case.

  If you can't use SIG_IGN to force automatic clean-up, then you've got
to write a signal handler to do it.  It isn't easy at all to write a
signal handler that does things right on all flavors of Unix, because of
the following inconsistencies:

  On some flavors of Unix, the SIGCHLD signal handler is called if one
*or more* children have died.  This means that if your signal handler
only does one wait() call, then it won't clean up all of the children. 
Fortunately, I believe that all Unix flavors for which this is the case
have available to the programmer the wait3() call, which allows the
WNOHANG option to check whether or not there are any children waiting to
be cleaned up.  Therefore, on any system that has wait3(), your signal
handler should call wait3() over and over again with the WNOHANG option
until there are no children left to clean up.

  On SysV-derived systems, SIGCHLD signals are regenerated if there are
child processes still waiting to be cleaned up after you exit the
SIGCHLD signal handler.  Therefore, it's safe on most SysV systems to
assume when the signal handler gets called that you only have to clean
up one signal, and assume that the handler will get called again if
there are more to clean up after it exits.

  On older systems, signal handlers are automatically reset to SIG_DFL
when the signal handler gets called.  On such systems, you have to put
"signal(SIGCHILD, catcher_func)" (where "catcher_func" is the name of
the handler function) as the first thing in the signal handler, so that
it gets reset.  Unfortunately, there is a race condition which may cause
you to get a SIGCHLD signal and have it ignored between the time your
handler gets called and the time you reset the signal.  Fortunately,
newer implementations of signal() don't reset the handler to SIG_DFL
when the handler function is called.

  The summary of all this is that on systems that have wait3(), you
should use that and your signal handler should loop, and on systems that
don't, you should have one call to wait() per invocation of the signal
handler.  Also, if you want to be 100% safe, the first thing your
handler should do is reset the handler for SIGCHLD, even though it isn't
necessary to do this on most systems nowadays.

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710