Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!cbatt!ihnp4!inuxc!pur-ee!uiucdcs!uiucuxc!ccvaxa!aglew
From: aglew@ccvaxa.UUCP
Newsgroups: net.arch
Subject: Re: paging and loading
Message-ID: <5100148@ccvaxa>
Date: Thu, 9-Oct-86 11:00:00 EDT
Article-I.D.: ccvaxa.5100148
Posted: Thu Oct  9 11:00:00 1986
Date-Received: Tue, 14-Oct-86 05:28:44 EDT
References: <949@usl.UUCP>
Lines: 100
Nf-ID: #R:usl.UUCP:949:ccvaxa:5100148:000:4905
Nf-From: ccvaxa.UUCP!aglew    Oct  9 10:00:00 1986


> [Eric Green {akgua,ut-sally}!usl!elg]:

First, my own conclusions: virtual memory may not be necessary for big job
systems, but is for just about anything else. Since I don't see myself
building big single job systems in the near future (or any systems, sigh) I
will take virtual memory as a given. But, I still can see reasons not to have
virtual memory - in certain circumstances.


> A rehash of past arguments, about the working through huge matrix problem:
>    "By the time we reach the end of the list, the front of the list will
> have been paged out".
>     If the front of the matrix has been paged out, that means that the
> matrix was too big to fit in physical memory. That means that with no
> paging, the performance would be 0 -- a 100% degradation.

Talk about spurious line noise! If people explicitly overlay the matrix,
they can still process it - that's a lot better than 0 performance.


>   "We can take advantage of disk seek times -- just put the data on
> close tracks."
>     All the disk systems I have ever seen become somewhat fragmented
> over time. 

Right. But high performance systems give you a way to make certain files
physically contiguous. Whether it's worth the bother is another question.


>     Also note that, because of intelligent i/o controllers, pulling a
> page off of disk takes virtually no CPU time, since the processor is
> running another process while the first one is blocked waiting for the
> i/o processor to bring in the page (assuming a multi-tasking system --
> Crays and such DO multi-task, no?). In otherwords taking advantage of
> explicit multi-processing.

Three things: (1) it is running *another* process, not the one that you want
to make really fast. The crux is, sometimes you want to make one process
really fast, at the cost of its mates. (2) Sure, your CPU isn't idle - it's
taking all sorts of TLB misses and cache faults and swapping out its
registers as it does the context switch between the blocked, paging,
process, and the new one. The IBM 3090 has already shown an example of a
system where it is faster to page synchronously rather than context switch,
in which case your CPU *is* idle while paging - and these are likely to
become more common. (3) By initiating overlays in advance, asynchronously, a
non-virtual process also does not leave the CPU idle - nor does it waste
CPU cycles in a context switch.

{Has anyone a good way of estimating time lost to context switches - totting
up register save, misses, etc.? I need a fairly abstract definition of
context for a model of multiprocessing that shows that multiprocessing is
sometimes faster than any uniprocessor system can be, for the same problem}

 
> The only one which is really valid is the table lookup times
> necessary for paging. There is also the overhead of paging in the MMU lookup
> table the initial time. 
Should that be prefetched at a context switch?

> [Page size]: If 4k was OK back in Multics days, 8K or even 16K might work
> today. 
Gould already has 8K pages. The Japanese have gone for 1M pages.

> Also note that this is WORST CASE. Your mileage will probably be better.
Sometimes you have to design for worst case (like, when the benchmarks that
customers use to decide whether they should buy your machine are your worst
case scenarios). I have a cute little idea for cramming more stuff into a
TLB, if the system will occasionally try to allocate contiguous physical
memory, the architect I showed it to said yes, my idea was better than the 
standard trick for multiple page size systems - but it wasn't worth doing,
because he would still have to make the TLB large enough to handle the
worst case, where each contiguous chunk was the smallest page size. Sigh.


> Especially considering that
> because associative lookups probably run about the same speed as
> checking bounds registers, the memory speed would be otherwise about
> the same.

Do really fast systems use bounds registers? What bounds registers?


> Another argument was that paging messes up pipelining...
> Requiring vectors to be page-aligned and having the pages the
> same size as the registers or a binary size larger would eliminate any
> pipeline lossage (just do your virtual memory grocking BEFORE you get
> your pipeline rolling).

This won't work if you want to pipeline calculations of a vector of addresses
(or indexes) with the actual access. IE. the addresses are not available
before the operation starts. So, either you don't pipeline this (a
considerable loss, but one that can be tolerated) or you make it possible
to recover from a page fault in the middle of an operation (ouch).

> And, of course, MIPS machines & others prove that for
> ordinary code, pipelining and paging are not mutually exclusive.

Carefully designed instruction sets mean that page faults will only occur
at a limited number of known locations, making recovery easier.