Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!stanford.edu!agate!dog.ee.lbl.gov!elf.ee.lbl.gov!torek
From: torek@elf.ee.lbl.gov (Chris Torek)
Newsgroups: comp.unix.wizards
Subject: Re: Performance Tuning Ultrix 4.1
Keywords: paging swapping fast large load BSD
Message-ID: <12759@dog.ee.lbl.gov>
Date: 2 May 91 19:18:19 GMT
References: <1991Apr30.160331.16215@milton.u.washington.edu> <12714@dog.ee.lbl.gov> <1991May2.052140.27048@milton.u.washington.edu>
Reply-To: torek@elf.ee.lbl.gov (Chris Torek)
Organization: Lawrence Berkeley Laboratory, Berkeley
Lines: 85
X-Local-Date: Thu, 2 May 91 12:18:20 PDT

In article <1991May2.052140.27048@milton.u.washington.edu>
corey@milton.u.washington.edu (Corey Satten) writes:
>Oh dear, I can't believe I'm about to take issue with Chris Torek.

Well, I *do* goof sometimes :-)

>Chris, I find almost identical code in 4.3BSD (tahoe I think).  In file
>vm_page.c the lines which prevent data from paging are almost identical

I was only talking about your second posting, which I thought dealt only
with the swap code.  I presume you mean these three lines:

	if (c->c_type != CTEXT) {
		if (rp->p_rssize < saferss - rp->p_slptime)
			return (0);
	}

Since p_rssize is in units of `core clicks' (512 or 1024 bytes) and
saferss is 32 and p_slptime is in [0..127] and all of the numbers
are signed (I checked :-) ), this should only affect processes with
less than 16 or 32 KB of resident set size, and once they have been
asleep for 32 seconds, it should not affect them at all (since rssize
will always be >= 0 and the rhs of the compare will be <= 0).

>In file vm_sched.c the code near the comment about swapping out
>deadwood is almost identical with Ultrix and on our system, that
>was the code which was doing almost all of the swapping

Aha!  This code is supposed to run `almost never', as far as I can
tell.  The idea is:

	if we think we need memory, or if the process is already out; and
	if this process has not run for `a long time' (20 seconds); and
	if nothing funny is going on with the text or proc;
	throw it out.  (See below as to why already-out means anything)

Hence `freemem < desfree' rather than `freemem < lotsfree': the pageout
daemon is supposed to keep free memory in the range [desfree..lotsfree)
under normal load.

>(though possibly because I elevated the values of lotsfree and
>desfree and scan rate to improve performance in round 1).  I think you
>are talking about the "hardwsap" code (when desperate == 1), which on
>our system, turned out to rarely execute.

I was.  (I believed the comment rather than trying to figure out the
code.  This code is all gone in the new Mach-based VM anyway.)

>Furthermore, now that we are paging out data, we aren't swapping
>processes with RSS>0 at all so I think the paging part of the fix may
>be more important than the swapping part anyway.

It is.  The swap code in the old BSD VM is only supposed to fire off
in a few special cases:

	- very low on memory, and pageout daemon cannot keep up (this
	  is the `hardswap' case);
	- expansion swaps (need space between p0 and p1 and the usual
	  easy expansion failed);
	- process has already paged out entirely, and the UPAGES pages
	  of u. might help, so kick out its u. as well (this is one of
	  the `deadwood' cases)---since UPAGES is 16 (*1024 bytes) this
	  is not really very profitable;
	- very low on memory (freemem < desfree) and process has been
	  idle for some time (this is the other `deadwood' case);
	- `kernelmap' has become fragmented (need contiguous pte
	  pages): we swap like crazy just to defragment it (horrible,
	  but rare).

Anyway, now that I am looking at the hardswap code, I think you are
right:  the code iterates through the whole proc table (never stops)
but only accumulates `big' processes if it does not find a `sleeper'
(something sleeping for > 20 seconds).  Generally there is always at
least one such, and it takes that one and then starts the whole thing
over (as you described).  If the paging system has done its job,
however, this will gain little and after a dozen or so sleepers
have been swapped out, the `big' process code will fire.

Incidentally, it is not surprising that the code worked poorly on
DECstations: it is tuned for machines on which the CPU is considerably
slower than the I/O, rather than the other way around.  On the 780,
it was often better to `work hard' than to `work smart'....
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov