Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!stanford.edu!agate!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.unix.wizards Subject: Re: Performance Tuning Ultrix 4.1 Keywords: paging swapping fast large load BSD Message-ID: <12759@dog.ee.lbl.gov> Date: 2 May 91 19:18:19 GMT References: <1991Apr30.160331.16215@milton.u.washington.edu> <12714@dog.ee.lbl.gov> <1991May2.052140.27048@milton.u.washington.edu> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 85 X-Local-Date: Thu, 2 May 91 12:18:20 PDT In article <1991May2.052140.27048@milton.u.washington.edu> corey@milton.u.washington.edu (Corey Satten) writes: >Oh dear, I can't believe I'm about to take issue with Chris Torek. Well, I *do* goof sometimes :-) >Chris, I find almost identical code in 4.3BSD (tahoe I think). In file >vm_page.c the lines which prevent data from paging are almost identical I was only talking about your second posting, which I thought dealt only with the swap code. I presume you mean these three lines: if (c->c_type != CTEXT) { if (rp->p_rssize < saferss - rp->p_slptime) return (0); } Since p_rssize is in units of `core clicks' (512 or 1024 bytes) and saferss is 32 and p_slptime is in [0..127] and all of the numbers are signed (I checked :-) ), this should only affect processes with less than 16 or 32 KB of resident set size, and once they have been asleep for 32 seconds, it should not affect them at all (since rssize will always be >= 0 and the rhs of the compare will be <= 0). >In file vm_sched.c the code near the comment about swapping out >deadwood is almost identical with Ultrix and on our system, that >was the code which was doing almost all of the swapping Aha! This code is supposed to run `almost never', as far as I can tell. The idea is: if we think we need memory, or if the process is already out; and if this process has not run for `a long time' (20 seconds); and if nothing funny is going on with the text or proc; throw it out. (See below as to why already-out means anything) Hence `freemem < desfree' rather than `freemem < lotsfree': the pageout daemon is supposed to keep free memory in the range [desfree..lotsfree) under normal load. >(though possibly because I elevated the values of lotsfree and >desfree and scan rate to improve performance in round 1). I think you >are talking about the "hardwsap" code (when desperate == 1), which on >our system, turned out to rarely execute. I was. (I believed the comment rather than trying to figure out the code. This code is all gone in the new Mach-based VM anyway.) >Furthermore, now that we are paging out data, we aren't swapping >processes with RSS>0 at all so I think the paging part of the fix may >be more important than the swapping part anyway. It is. The swap code in the old BSD VM is only supposed to fire off in a few special cases: - very low on memory, and pageout daemon cannot keep up (this is the `hardswap' case); - expansion swaps (need space between p0 and p1 and the usual easy expansion failed); - process has already paged out entirely, and the UPAGES pages of u. might help, so kick out its u. as well (this is one of the `deadwood' cases)---since UPAGES is 16 (*1024 bytes) this is not really very profitable; - very low on memory (freemem < desfree) and process has been idle for some time (this is the other `deadwood' case); - `kernelmap' has become fragmented (need contiguous pte pages): we swap like crazy just to defragment it (horrible, but rare). Anyway, now that I am looking at the hardswap code, I think you are right: the code iterates through the whole proc table (never stops) but only accumulates `big' processes if it does not find a `sleeper' (something sleeping for > 20 seconds). Generally there is always at least one such, and it takes that one and then starts the whole thing over (as you described). If the paging system has done its job, however, this will gain little and after a dozen or so sleepers have been swapped out, the `big' process code will fire. Incidentally, it is not surprising that the code worked poorly on DECstations: it is tuned for machines on which the CPU is considerably slower than the I/O, rather than the other way around. On the 780, it was often better to `work hard' than to `work smart'.... -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov