Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!cmcl2!lanl!jlg From: jlg@lanl.ARPA (Jim Giles) Newsgroups: net.arch Subject: Re: VERY LARGE main memories: crypt Message-ID: <7847@lanl.ARPA> Date: Tue, 23-Sep-86 20:55:12 EDT Article-I.D.: lanl.7847 Posted: Tue Sep 23 20:55:12 1986 Date-Received: Wed, 24-Sep-86 22:01:37 EDT References: <1276@bu-cs.bu-cs.BU.EDU> <7537@lanl.ARPA> <1819@mmintl.UUCP> Reply-To: jlg@a.UUCP (Jim Giles) Organization: Los Alamos National Laboratory Lines: 64 In article <1819@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >In article <7537@lanl.ARPA> jlg@a.UUCP (Jim Giles) writes: >>In article <1276@bu-cs.bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >>>Am I missing something? What's the huge problem with this hypothetical >>>vadvise(CYCLIC)? Isn't it just MRU (most recently used)? >>The are two thing wrong with this scheme. 1) Suppose that you problem >>fails to fit in main memory by only a few page sizes. The above scheme >>will cycle each and every page of the whole data structure into memory >>and back out again on every time step. > >No it won't. Only the number of pages which don't fit are read in each >cycle. (E.g., if there are 103 pages and 100 will fit, 3 are read each >cycle.) In fact, MRU behaves optimally[1][2] for a purely cyclic access >pattern. In particular, it has less I/O overhead than setting aside a few >pages at the end to be overlaid. > The MRU algorithm (as far as I can see) does the following: Assume M pages of data are to be used cyclically and only N pages will fit. Initially, the first page is loaded and the code begins executing. At the end of page 1, the next page is loaded, and so on for N pages. Now you have a choice to make, do you flush page 1 or page N to make room for N+1. The MRU algorithm presumably flushes page N (the most recently used). And so you continue up to page M. Now on the second cycle through the data, the first N- 1 pages are already there, so no page fault occurs until page N is needed. Then a page fault occurs for every page until page M. This doesn't seem optimal to me. Or did I get the algorithm wrong? In this case, even LRU (least recently used) seems more efficient since some, at least, of the I/O is done asynchronously. The most efficient algorithm would be to load N pages at the start, when done with the first page flush it out and load the N+1st page - and so on: always flushing the most recently used and loading the most distantly required. This makes all the I/O as asynchronous as possible. If M-N is small compared to N, you could reduce the I/O requirement by loading N-M pages, skipping every M-Nth one, and then after every M-N pages, flush the most recent and load the most distantly needed (as before except that only M-N pages are being swapped in and out). There are other techniques (that I've mentioned in previous submissions) that reduce the I/O load or increase the amount of asynchronous I/O in many cases. Most of these techniques would be difficult to provide in a general purpose VM system. You might argue that MRU should be modified to include the technique I just mentioned. But how does a demand driven system know what the value of M is? How does it know that M remains unchanged from one cycle to the next? What is some other data structure (like an equation of state, for example) is being referenced following a random pattern of use - at the same time that the cyclically used data is being traversed: how do I advise the VM system that some of the pages should be referenced MRU and some should be LRU? I can solve all the above problems explicitly if I have control over memory contents. I can't off hand see any solution that a VM system could take advantage of. A lot of work has been done in the optimization of memory management for scientific codes. Many of the techniques that have resulted are not well suited to any demand driven context but, instead, rely upon knowledge of the data use patterns of specific algorithms. J. Giles Los Alamos