Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!pacbell.com!ucsd!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.unix.wizards Subject: Re: Performance Tuning Ultrix 4.1 Keywords: paging swapping fast large load BSD Message-ID: <12792@dog.ee.lbl.gov> Date: 3 May 91 19:06:39 GMT References: <12714@dog.ee.lbl.gov> <1991May2.052140.27048@milton.u.washington.edu> <12759@dog.ee.lbl.gov> <1991May2.231911.23612@milton.u.washington.edu> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 87 X-Local-Date: Fri, 3 May 91 12:06:39 PDT In article <1991May2.231911.23612@milton.u.washington.edu> corey@milton.u.washington.edu (Corey Satten) writes: >however I believe that code never executed on Ultrix or BSD because of >the code a few lines before (BSD code fragment): > > if ((rp->p_flag & (SSEQL|SUANOM)) == 0 && > rp->p_rssize <= rp->p_maxrss) > return (0); > >which the front hand does for valid pages. I think this means that unless >the process has executed a vadvise() to warn of sequential or anomalous >paging behavior, the front hand never invalidates data pages. Not in the old BSD kernel. The code path is, for the front hand (on the VAX): /* * `page cluster' info is generally treated as `bits in the * first pte mapping a page the cluster', hence the `mark first' * code below. */ if (page cluster is valid) { mark page cluster invalid; mark process SPTECHG; if (any page in the cluster is modified) mark the first page modified; make all pages in the cluster look like the first one; if (it is a text page cluster) let other users know about changes; if (process is normal) we are done, return 0; } The SEQL and SUANOM cases, and the rssize > maxrss case, are where the process should not be paged in LRU fashion but rather in `almost MRU'. In this case, if the page was valid and we made it not-valid we also try to page it out. This is not quite right---we should not be paging out text pages as the vadvise call is for data, not text---but is probably `good enough'. On the Tahoe, which has a reference bit, the valid bit and `fast reclaim' stuff is unnecessary, and the code path looks like: if (this is a text page cluster) if (any process using it has referenced it) mark the first page as referenced; if (the page cluster has been referenced) { mark page cluster not-referenced; if (any page in the cluster is modified) mark the first page modified; make all pages in the cluster look like the first one; if (it is a text page cluster) let other users know about changes; if (process is normal) we are done, return 0; } The back hand, of course, never looks at valid/referenced pages at all. Thus, the code in `>' above is meant only to cause the front hand to do pageouts. For most processes, the front hand merely paves the way for the back hand to do pageouts. Imagine a wall clock with both hands moving at the same speed: the time it takes for the back hand to pass the same spot as the front hand is the time a process is given to `reclaim' a page. We take it away, but leave it in memory, and if you ask for it before the back hand gets around to it, you get to keep it. If not, we dust off the page (write it to swap) if it is dirty, and then put it in the `clean' pile (free pages). For processes that have done a vadvise(SSEQL) or (SUANOM), we presume instead that `recently used' means `unlikely to be reused', so in this case we have the front hand dust it off instead---if you just used it, we take it away. The planned 4.2BSD `new VM' (which is only now being implemented by beating the Mach VM into a different shape) had an `madvise' call which was intended to mark anomalous or sequential behaviour on a per-region or per-page basis, rather than per-process. In the meantime the old vadvise call was deprecated, but it lives on. . . . Presumably someone broke this in the DEC MIPS port. The MIPS chip does not have PTEs, so PTEs must be done in software, so you can define your own used/modified/ref'd/etc bits. This is much easier in the new Mach-based VM, where the responsibility for hardware management is in a separate file. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov