Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!pt.cs.cmu.edu!cadre.dsl.pitt.edu!pitt!amanue!oglvee!jr
From: jr@oglvee.UUCP (Jim Rosenberg)
Newsgroups: comp.unix.i386
Subject: Re: Help!  Altos 5.3.1 fork is failing!
Message-ID: <509@oglvee.UUCP>
Date: 19 Oct 89 17:00:27 GMT
References: <506@oglvee.UUCP> <4219@cuuxb.ATT.COM>
Reply-To: jr@oglvee.UUCP (Jim Rosenberg)
Organization: Oglevee Computer Systems, Connellsville, Pa
Lines: 62

In article <4219@cuuxb.ATT.COM> dlm@cuuxb.UUCP (Dennis L. Mumaugh) writes:
>Ordinarily I don't answer questions like this as I work for
>support and customers pay money for answers, but .... 

Thank you for going above and beyond the call of duty.  Since I have an
unreliable operating system for which we paid real money, it's a comfort to
know we don't have to pay more real money to find out how to get relief from
the defects in what we already paid our money for.

>In article <506@oglvee.UUCP> jr@oglvee.UUCP (Jim Rosenberg)
>writes:
>
>        What the bleep is getcpages?
>
>        [...]
>
>        How could it fail on a request to get only 1 page unless
>        I'm out of swap space?
>
>How did you guess?

Are you *ABSOLUTELY* sure this is the only way getcpages can fail???  I already
have one response to the contrary.

>        (Which I'm not.  We're getting these with many many
>        thousand blocks of free swap space -- we have a swap(1)
>        which will show these.)
>
>Not true! /etc/swap only shows actual use of swap not committed use
>of swap.  Similarly for sar reports.

OK, you can tell me all you like that swap is broken and is lying to me and
that sar is broken and is lying to me (these are *my* fault???) and that I
really really am out of swap space, but frankly I just don't believe this.  I
*DID* add a new swap partition with swap -a (*before* posting the original
article, as a matter of fact.)  The system is clearly using it.  I got one
fork failure with no interactive users logged in -- we had 4 database servers
up and one client batch job, which had three or four child UNIX processes --
enough to page a bit perhaps but nowhere *NEAR ENOUGH* loading to exhaust
24,000 blocks of swap space.  If my swap space runs out with lots of users
then I can deal with that, but if that were my problem then the whole system
would come crashing to its knees many times a day.  I'm sorry, but I just
don't believe you're right that every fork failure happens because I truly am
out of swap space.

>True, some code isn't very robust and ought to sleep and wait for
>less load, but people who do forks don't examine error codes, nor
>do people who do execs.   fork and exec will return either ENOSPC or
>EAGAIN if you would check errno.
           ^^^
If **WHO** would check errno???  I beg your pardon?  I am supposed to dig into
cron with a can opener (we are a binary licensee, not source!) and somehow
"check" errno?  When I get a fork failure from a fork issued by cron it cutely
logs the fact that fork failed, and that it is "rescheduling".  Right.  It then
just falls asleep and no more cron jobs run.  When csh gets the fork failure
it simply reports "No more processes".  Um, just what would you like me to
check here?  It's *you folks in AT&T* who should check errno, don't you think?
-- 
Jim Rosenberg                        pitt
Oglevee Computer Systems                 >--!amanue!oglvee!jr
151 Oglevee Lane                      cgh
Connellsville, PA 15425                                #include <disclaimer.h>