Xref: utzoo comp.unix.i386:796 comp.unix.wizards:18688 Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!ginosko!gem.mps.ohio-state.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!cadre.dsl.pitt.edu!pitt!amanue!oglvee!jr From: jr@oglvee.UUCP (Jim Rosenberg) Newsgroups: comp.unix.i386,comp.unix.wizards Subject: Help! Altos 5.3.1 fork is failing! Message-ID: <506@oglvee.UUCP> Date: 14 Oct 89 22:43:49 GMT Organization: Oglevee Computer Systems, Connellsville, Pa Lines: 53 We just recently "upgraded" [sic] an Altos 2000 from Xenix 5.2c to UNIX 5.3d. uname reports the operating system as 5.3.1. We have 4M RAM and before the upgrade the machine just screamed. Now we are paging like mad and getting sporadic fork failures. The increased paging activity has my users bitching and moaning, but the fork failures are like a sniper loose in my system gunning down processes sporadically. The problem is surely *not* insufficient process table slots. crash(1) reports we have 180 slots (NPROC is 0 in the tuning parameter file, which on this system is called /usr/sys/master.d/kernel) and we've got nowhere within a country mile of that many processes. The per-user limit is 30, and we're getting fork failures where that's not exceeded either. The system error reporting is filled with messages like this: 000146 07:50:06 00e6f0f6 ... 0000 00 NOTICE: getcpages - waiting for 1 contiguous pages 000147 08:13:16 00e80082 ... 0000 00 000148 08:13:16 00e80082 ... 0000 00 NOTICE: getcpages - Insufficient memory to allocate 1 contiguous page - system call failed ^^^^^^^^^^^^^^^^^^ In many cases I can exactly correlate one of these "system call failed" messages with a fork failure. According to the man page for fork(2) there are 3 ways a fork can fail: No process table slots left, exceeding the per-user limit, and a most obscure indeed 3rd one: "Total amount of system memory available when reading via raw IO is temporarily insufficient". Either the man page lies or this third one is it. I took a blind stab and guessed that the parameter involved here is PBUF. Altos recommends PBUF=8 straight across the board no matter how much memory you have. Sounds pretty odd to me, since on a 6386 running V.3.2 with 2 Meg RAM I've got 20, and never fiddled with it. I jacked up PBUF to 16 -- but it made no difference. So, my questions are: What the bleep is getcpages? It sounds like an internal kernel routine to get continuous pages in RAM. Is this call issued by the paging daemon? How could it fail on a request to get only 1 page unless I'm out of swap space? (Which I'm not. We're getting these with many many thousand blocks of free swap space -- we have a swap(1) which will show these.) Is there a tunable parameter that will rescue me here? Altos seems to think that a failed fork should only get a "NOTICE". Yeah, well, I notice all right. It's bad enough when the shell reports "No more processes" -- you just try again and it works. But we have all kinds of batch jobs that spawn uux requests and other such things and they're just getting shot right out of the sky. Any words of wisdom gratefully accepted! I skimmed over the likeliest parts of Bach to see if the light would dawn -- looks like I better go back and reread the section on demand paging pretty carefully. -- Jim Rosenberg pitt Oglevee Computer Systems >--!amanue!oglvee!jr 151 Oglevee Lane cgh Connellsville, PA 15425 #include