Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!brutus.cs.uiuc.edu!wuarchive!wugate!uunet!yale!leichter
From: leichter@CS.YALE.EDU (Jerry Leichter)
Newsgroups: comp.arch
Subject: Re: Memory utilization & inter-process contention
Message-ID: <70890@yale-celray.yale.UUCP>
Date: 28 Aug 89 22:56:02 GMT
Sender: root@yale.UUCP
Organization: Yale Computer Science Department, New Haven, Connecticut, USA
Lines: 103
X-from: leichter@CS.YALE.EDU (Jerry Leichter (LEICHTER-JERRY@CS.YALE.EDU))

In article <2389@auspex.auspex.com>, guy@auspex.auspex.com (Guy Harris) writes...
>>I must be lucky then.  VMS has had one since day one.  Working set 
>>parameters can be set by either authorization information, as parameters 
>>to the $CREPRC system service (which creates a process) or explicitly 
>>during process execution (SET WORKING_SET command, or $ADJWSL system 
>>service).
> 
>OK, so which of the OSes that support working set scheduling do so
>without having to be told by some external agent what the working set of
>a process at some given time is?  (Do the VMS calls even tell the OS
>that, or do they just tell it how big the working set is?)

They only specify the size.  A process in VMS comes equipped with three
significant numbers controlling working set allocation:  The limit, quota,
and extent.

The working set limit is essentially the "zero point" for working set adjust-
ment:  When an image exits, the process's working set is set back to its
working set limit.  (Recall that VMS doesn't create a new process per executed
image; rather, it re-uses the existing context.  This is done by having two
threads of control within each process.  User code runs in the user-level
thread; DCL, the equivalent of the Shell, runs in the supervisor-level thread.
(Actually, there are actually two more threads, executive and kernel, but they
aren't at issue here.)  When user-level code exits, all user-level structures
are discarded, all user-level channels are closed, and all pages of virtual
memory owned by user mode are freed.  DCL then goes on to the next command.
In over-all effect, this is like the Shell's use of fork/exec except with
light-weight threads (shared memory space).  It's not EXACTLY like light-
weight threads as they are normally pictured because the hardware protects
the supervisor-mode thread from the user-mode thread.)

Anyhow, returning to working sets:  The second parameter, the working set
quota, is the amount of VM that the process is guaranteed to have available
to it.  On a busy system, it is the upper bound on the process's working set
size.

If the system has plenty of spare memory, a process is allowed to exceed its
working set quota.  The third parameter, working set extent, is a hard limit
on the size of a process's working set.  When a process with a working set
size below its quota needs pages, and none are available, the system will
grab them back from processes that have "borrowed" pages and are over quota.
(If there are no "borrowed" pages to be found, the system takes a number of
other actions; ultimately, it swaps entire processes out, making all of their
pages available to satisfy the demand for "guaranteed" pages.)

The net effect is that when the system is short of memory, processes page
against themselves, while when the system has memory to spare, they page
against the system-wide pool of pages, but are still subject to an upper
limit on size.

There are various parameters that control things like how many pages the
system must have free before it will allow processes to exceed their working
set quotas.  In fact, there are probably 10 or 20 user-settable parameters
to let you tune the performance of the system.  Some can be changed on a
running system; others require a re-boot.  Setting them is something of a
black art; fortunately, standard VMS comes with programs to set them auto-
matically, based on actual workload.

>Jerry Leichter claimed that the VMS software people showed that a
>reference bit is not necessary, given the appropriate algorithms; does
>VMS how have appropriate algorithms to determine the contents of the
>working set without requiring a reference bit, or did the designers
>decide that the appropriate algorithm is to believe what the user or the
>application tells you and not try to figure it out for yourself?  (Not a
>rhetorical question - I'm willing to accept that the latter is the
>appropriate algorithm, given sufficient evidence to demonstrate that
>claim.)

The basic algorithm is simple:  Pages lost from a process working set go to
the end of a free list.  If a fault occurs for a page on the free list, it
can be given back to the process; its contents will be unchanged.

The lists are not actually searched, since the structures defining virtual
memory contain sufficient information to find a page still on the free list
with little effort.  Such "soft faults" are quite fast - not as fast as not
having incurred the fault at all, of course, but perhaps a few tens of in-
struction times.

There's more to this, of course.  For example, a "dirty" page can't go to the
free list; instead, it goes to a dirty list.  When the dirty list gets long
enough, the modified page writer writes it out and eventually moves the page
to the free list.  The page remains eligible for return to the process via a
"soft" fault until it is actually given to some other process.  Under normal
conditions, pages actually within the working set of a process are almost
certain to be found on a free list even if grabbed.

There are all sorts of additional twists.  For example, some pages can be
locked in the working set.  Any program can do this, though it's rare that
user-mode programs do.  Rather, things like pages containing page tables get
locked in the working set.  (Some pages HAVE to be locked in the working set
to avoid deadlocks in the pager.)  Also, when VMS scans for pages to "steal"
from a process, it keeps enough context to try to avoid recently-faulted-in
pages.  (I don't recall the details.)

Does this all work?  Well, both VMS (and Unix these days) seem to support vir-
tual memory on VAXes with a great deal of success.  I haven't seen anyone
claim to have an architecture of roughly equivalent CPU power/memory and I/O
bandwidth/etc. but WITH page reference bits which supports more processes/
does measurably better at VM management/etc.  The VAX architecture has been
pretty heavily studied - there are a number of published papers - and none
of them that I have read has found a bottleneck in the page replacement code.

							-- Jerry