Path: utzoo!attcan!uunet!decwrl!bacchus.pa.dec.com!decvax.dec.com!mcnc!rti!dg-rtp!magic!rice
From: rice@dg-rtp.dg.com (Brian Rice)
Newsgroups: comp.unix.wizards
Subject: Re: Sys V fork IS broken!
Keywords: "What UUUUUUUUNIX meeeeans to meeeeeee..."
Message-ID: <1990Jul30.002642.18244@dg-rtp.dg.com>
Date: 30 Jul 90 00:26:42 GMT
References: <480@amanue.UUCP> <13426@cbmvax.commodore.com> <573@oglvee.UUCP> <13435@smoke.BRL.MIL> <1990Jul28.195032.18746@watdragon.waterloo.edu>
Sender: usenet@dg-rtp.dg.com (Usenet Administration)
Reply-To: rice@dg-rtp.dg.com
Followup-To: comp.unix.wizards
Organization: Data General Corporation, Research Triangle Park, NC
Lines: 112

In article <1990Jul28.195032.18746@watdragon.waterloo.edu>,
tbray@watsol.waterloo.edu (Tim Bray) writes:
|> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
|> > jr@oglvee.UUCP (Jim Rosenberg) writes:
|> > -But if system calls fail simply because of a very temporary bout
of activity,
|> > -that is *not my problem*!  It's the kernel's problem...
|> > Oh, good grief.  It is SILLY to say that the kernel should be redesigned
|> > to compensate for bugs in application programs.
|> 
|> I think [...] Doug Gwyn's comment is (unusually for him) wrong.
|>
|> Having write(2) fail because a disk is full is OK - there are several
|> strategies which a program might reasonably adopt to handle this. 
But having
|> fork() fail because of a likely-transient OS state is a stinking crock.  

My fingers almost made me redirect followups to this post to
alt.religion.computers,
because we are surely veering close to matters of faith.  But I do think
there's
something to be said in defense of traditional fork.

|> If there is a good chance that the kernel can fix this up without a
gratuitous
|> time delay, it should do so.  If not (i.e.  process creation has become
|> impossible) the whole system is seriously sick and all the
applications should
|> ideally hear about this PDQ so they can start taking disaster relief
|> measures.  

If the kernel has to make a call to fork fail, it does so for one of
exactly two
reasons: some system-imposed limit would be exceeded, or insufficient memory is
available.  That's all.  Neither of these conditions means that the system is
"seriously sick"; any process which isn't going to fork again and (in
the second
case) isn't going to do anything malloc'y need never even hear of the
situation.

If the system really is "sick"--i.e., some internal data structure is
corrupted--
then the system is going to panic, *now*, and rightly so.  (If the
kernel can't believe
its own internal data, how can it credibly notify processes to begin "disaster
relief"?  Admittedly, there's a bit of computer religion here: that programs
should fail before they lie.  But I think that sect has a great many
adherents.)
Conversely, a system isn't sick just because resources are under heavy
contention.

And, of course, the kernel tells you why your fork failed: you get EAGAIN or 
ENOMEM in errno.  All told, this means that you, the application programmer, 
gets to choose what happens in the event of a fork failure, and you even get 
some information to help your application make the choice.  That "Put the 
programmer in the driver's seat" orientation really is what UNIX means to me.

|> And speaking
|> from my experience in the application community, I think describing
absence of
|> special-purpose backoff & retry code for handling process creation
failure by
|> the OS as "bugs in application programs" is pretty arrogant and unrealistic.

"Special-purpose backoff and retry code"?  Can the kernel really do
better than this?

   while ((child = fork()) == -1 && ++error_count < MAX_FORK_FAILURES) {
       switch (errno) {
          case ENOMEM:
             if (theres_some_junk_I_can_free()) {
                 free(junk);
                 break;
             }
             /* fall through */
          case EAGAIN:
             sleep(MAYBE_LIFE_WILL_BE_NICER_IN_THIS_MANY_SECONDS);
             break;
          default:
             FatalError("Argh!  The man page lied!  #@!$& phone company OS!");
             exit(1);
       }
   }
   if (child == -1) {
       FatalError("Waaah!  The kernel won't let me fork!");
       exit(1);
   }

Well, maybe the kernel could queue each fork request that it was unable
to complete
and then satisfy each request in order...or maybe it could satisfy the smallest
request first, with some kind of aging mechanism to keep from starving forks
of big processes, etc., etc....this would get complicated, clearly, and might
even require so much overhead as to provoke thrashing.  But maybe you could do
it.  If you could, then how would you deal with the person who said, "Wait--if
the system is low on memory, I don't want my fork retried; I want to hear
about it so I can go off and do something else (maybe just sleep), then
retry"?  
This is the person who liked the old fork, and there are lots of such folk.
Looks like you'll have to add an old-fork-behavior flag, and then you'll
have two kinds of forks, some on a queue and some not, and all wanting
resources...

Clearly, this way lies VMS$MADNESS.  Let's hear it for minimal function calls
with clean interfaces, even if they necessitate a few more lines of
application code.
After all, *you* get to write that code, and you can package it up into a 
library function if you don't want to type it more than once.

Brian Rice   rice@dg-rtp.dg.com   +1 919 248-6328
DG/UX Product Assurance Engineering
Data General Corp., Research Triangle Park, N.C.