Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!cmcl2!lanl!jlg From: jlg@lanl.ARPA (Jim Giles) Newsgroups: net.arch Subject: Re: Reasons For Large Main Memories Message-ID: <7541@lanl.ARPA> Date: Mon, 15-Sep-86 19:21:10 EDT Article-I.D.: lanl.7541 Posted: Mon Sep 15 19:21:10 1986 Date-Received: Mon, 15-Sep-86 22:13:05 EDT References: <1161@bu-cs.bu-cs.BU.EDU> <8529@duke.duke.UUCP> <672@ur-tut.UUCP> <7418@lanl.ARPA> <684@ur-tut.UUCP> Reply-To: jlg@a.UUCP (Jim Giles) Organization: Los Alamos National Laboratory Lines: 58 Keywords: virtual memory appropriate In article <684@ur-tut.UUCP> tuba@ur-tut.UUCP (Jon Krueger) writes: >... > 3. applications which can trivially manage their own > spaces, with acceptable development and maintainence > costs, which achieve higher levels of performance > than their virtual memory equivalents >... >I invite anyone to submit examples of type 3. I haven't got any, but I >suspect there are a few. Yes, quite a few! It is my opinion that a large share of all scientific computing falls into this category. Certainly several of those I've written (parts of) or used. I wrote an EMP simulation code (Electro-Magnetic Pulse), for example, which was essentially a large 3-d grid with a satellite model inside - you hit it with x-rays and iterate Maxwell's equations on the grid. Maxwell's equations require you to keep 9 numbers for each position in the grid: 3 Electric field components, 3 Magnetic field components, and 3 current density components (x-rays knock free charges off the body of the satellite and the drift of these charges through the EM field cause current). Now, take a rectangular grid that's 100 cells on a side: or 1 Million cells (100^3). That's 9 million floating point numbers for the grid. This clearly didn't fit in the CDC 7600 I was using at the time (1975), so I had to 'page' the grid. This was made fairly easy by the cyclical nature of the required accesses: as soon as I was done with a plane (that is: I had updated a plane of the grid and all its neighbors) I could flush it out to disk and start reading the furthest ahead plane that would now fit (so that the input MIGHT complete before I needed the data). Note that the 'paging' activity took place in a natural dividing point of the code: outside the 'plane' loop. As a result of this, the inner 2 loops of the code (a 3-d loop, remember) were optimized as if the whole problem fit memory. Smaller grids were handled similarly - only the 'paging' might be done on a number of planes of the grid instead of one at a time. On a large grid, the 'paging' scheme could be moved into the next lower loop and the 'pages' would have been half-planes or even strips of the grid. For a very small grid (one that almost fits memory), the paging scheme could be modified to swap only a few planes (the actual technique was to keep the whole grid of each component of the field that would fit and only page the remaining components of the field). Since all the paging activity was isolated in the outer loops of the code, and since there was no additional address decoding circuitry in the memory interface, I claim that this code clearly works faster than it would have on a VM system. I had no VM computer to benchmark against at the time, so I don't have numbers to prove this claim (even if you were likely to believe benchmark results that you couldn't test yourself). Very many scientific codes have similar structure to the one described above: all finite differencing codes - of course, finite element codes, etc.. Since this is the case I claim that your class 3 set is well occupied by real world production programs. J. Giles Los Alamos