Path: utzoo!attcan!utgpu!watserv1!maytag!watdragon!watsol.waterloo.edu!tbray
From: tbray@watsol.waterloo.edu (Tim Bray)
Newsgroups: comp.unix.wizards
Subject: Sys V fork IS broken!
Message-ID: <1990Jul28.195032.18746@watdragon.waterloo.edu>
Date: 28 Jul 90 19:50:32 GMT
References: <480@amanue.UUCP> <13426@cbmvax.commodore.com> <573@oglvee.UUCP> <13435@smoke.BRL.MIL>
Sender: daemon@watdragon.waterloo.edu (Owner of Many System Processes)
Organization: University of Waterloo
Lines: 40

gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
 jr@oglvee.UUCP (Jim Rosenberg) writes:
 -But if system calls fail simply because of a very temporary bout of activity,
 -that is *not my problem*!  It's the kernel's problem...
 Oh, good grief.  It is SILLY to say that the kernel should be redesigned
 to compensate for bugs in application programs.

I've been earning my living writing application programs on Unix for some
years.  Sometimes application programs need to fork().  (in fact, an informal
scan of my memory fails to reveal an important non-trivial application that
never does a fork() (and the semantics of fork() are just right and one of the
best things about unix (and those who talk about the need for a spawn() or a
run() call should spend a few years in the SYS$_CREPRC mines (sorry for the
digression))).

Every application I've written, and every other one I've seen (aside from
amateurish toys that don't check return codes) forks about like this:

  if ((child = fork()) == -1)
    FatalSystemError("Serious system trouble! Can't create process!");
  else if (child == 0)
  { /* child */ }
  else
  { /* parent */ }

I think this is right and Doug Gwyn's comment is (unusually for him) wrong.  

Having write(2) fail because a disk is full is OK - there are several
strategies which a program might reasonably adopt to handle this.  But having
fork() fail because of a likely-transient OS state is a stinking crock.  If
there is a good chance that the kernel can fix this up without a gratuitous
time delay, it should do so.  If not (i.e.  process creation has become
impossible) the whole system is seriously sick and all the applications should
ideally hear about this PDQ so they can start taking disaster relief
measures.  I don't really think there's a middle ground here.  And speaking
from my experience in the application community, I think describing absence of
special-purpose backoff & retry code for handling process creation failure by
the OS as "bugs in application programs" is pretty arrogant and unrealistic.

Cheers, Tim Bray, Open Text Systems, Waterloo, Ont.