Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!cbatt!ihnp4!inuxc!pur-ee!uiucdcs!uiucuxc!ccvaxa!aglew From: aglew@ccvaxa.UUCP Newsgroups: net.arch Subject: Re: paging and loading Message-ID: <5100148@ccvaxa> Date: Thu, 9-Oct-86 11:00:00 EDT Article-I.D.: ccvaxa.5100148 Posted: Thu Oct 9 11:00:00 1986 Date-Received: Tue, 14-Oct-86 05:28:44 EDT References: <949@usl.UUCP> Lines: 100 Nf-ID: #R:usl.UUCP:949:ccvaxa:5100148:000:4905 Nf-From: ccvaxa.UUCP!aglew Oct 9 10:00:00 1986 > [Eric Green {akgua,ut-sally}!usl!elg]: First, my own conclusions: virtual memory may not be necessary for big job systems, but is for just about anything else. Since I don't see myself building big single job systems in the near future (or any systems, sigh) I will take virtual memory as a given. But, I still can see reasons not to have virtual memory - in certain circumstances. > A rehash of past arguments, about the working through huge matrix problem: > "By the time we reach the end of the list, the front of the list will > have been paged out". > If the front of the matrix has been paged out, that means that the > matrix was too big to fit in physical memory. That means that with no > paging, the performance would be 0 -- a 100% degradation. Talk about spurious line noise! If people explicitly overlay the matrix, they can still process it - that's a lot better than 0 performance. > "We can take advantage of disk seek times -- just put the data on > close tracks." > All the disk systems I have ever seen become somewhat fragmented > over time. Right. But high performance systems give you a way to make certain files physically contiguous. Whether it's worth the bother is another question. > Also note that, because of intelligent i/o controllers, pulling a > page off of disk takes virtually no CPU time, since the processor is > running another process while the first one is blocked waiting for the > i/o processor to bring in the page (assuming a multi-tasking system -- > Crays and such DO multi-task, no?). In otherwords taking advantage of > explicit multi-processing. Three things: (1) it is running *another* process, not the one that you want to make really fast. The crux is, sometimes you want to make one process really fast, at the cost of its mates. (2) Sure, your CPU isn't idle - it's taking all sorts of TLB misses and cache faults and swapping out its registers as it does the context switch between the blocked, paging, process, and the new one. The IBM 3090 has already shown an example of a system where it is faster to page synchronously rather than context switch, in which case your CPU *is* idle while paging - and these are likely to become more common. (3) By initiating overlays in advance, asynchronously, a non-virtual process also does not leave the CPU idle - nor does it waste CPU cycles in a context switch. {Has anyone a good way of estimating time lost to context switches - totting up register save, misses, etc.? I need a fairly abstract definition of context for a model of multiprocessing that shows that multiprocessing is sometimes faster than any uniprocessor system can be, for the same problem} > The only one which is really valid is the table lookup times > necessary for paging. There is also the overhead of paging in the MMU lookup > table the initial time. Should that be prefetched at a context switch? > [Page size]: If 4k was OK back in Multics days, 8K or even 16K might work > today. Gould already has 8K pages. The Japanese have gone for 1M pages. > Also note that this is WORST CASE. Your mileage will probably be better. Sometimes you have to design for worst case (like, when the benchmarks that customers use to decide whether they should buy your machine are your worst case scenarios). I have a cute little idea for cramming more stuff into a TLB, if the system will occasionally try to allocate contiguous physical memory, the architect I showed it to said yes, my idea was better than the standard trick for multiple page size systems - but it wasn't worth doing, because he would still have to make the TLB large enough to handle the worst case, where each contiguous chunk was the smallest page size. Sigh. > Especially considering that > because associative lookups probably run about the same speed as > checking bounds registers, the memory speed would be otherwise about > the same. Do really fast systems use bounds registers? What bounds registers? > Another argument was that paging messes up pipelining... > Requiring vectors to be page-aligned and having the pages the > same size as the registers or a binary size larger would eliminate any > pipeline lossage (just do your virtual memory grocking BEFORE you get > your pipeline rolling). This won't work if you want to pipeline calculations of a vector of addresses (or indexes) with the actual access. IE. the addresses are not available before the operation starts. So, either you don't pipeline this (a considerable loss, but one that can be tolerated) or you make it possible to recover from a page fault in the middle of an operation (ouch). > And, of course, MIPS machines & others prove that for > ordinary code, pipelining and paging are not mutually exclusive. Carefully designed instruction sets mean that page faults will only occur at a limited number of known locations, making recovery easier.