Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!elroy.jpl.nasa.gov!swrinde!cs.utexas.edu!rice!ariel.rice.edu!preston From: preston@ariel.rice.edu (Preston Briggs) Newsgroups: comp.arch Subject: Memory hierarchies Message-ID: <1991May7.152224.3146@rice.edu> Date: 7 May 91 15:22:24 GMT References: <1991May2.162909.9165@news.arc.nasa.gov> <819@cadlab.sublink.ORG> <1991May7.061500.7485@marlin.jcu.edu.au> Sender: news@rice.edu (News) Organization: Rice University, Houston Lines: 66 >lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: >I have only limited experience with the new, fast-only-in-cache, machines, >but I have to say that the code you need to get optimum performance is >even more non-intuitive than that for the older vector architecture machines. >Even worse, code which was previously optimal for vector machines, and which >was OK on a wide variety of other machines, is now pessimal for these machines The reality of big systems is that they are implemented with a memory hierarchy. Typically registers cache tlb ram disk A Cray, running vector stuff, it might look more like ram disk but the hierarchy still exists. A fair amount of the money spent on a super is dedicated toward flattening the hierarchy. For best results, you (or your compiler) should be concious of the implementation (not just the architecture) of the target machine. Everyone knows about stripping for cache. Well, you can also block for registers, tlb, and ram. Ken Kennedy would like to see programmers coding in a "blockable" style, with compilers doing the actual blocking. He makes an analogy with vectorization. When vectorizing compilers became available, programmers learned to write somewhat stylized loops that they. expected the vectorizer to recognize and handle efficiently; they learned to write vectorizable code. In many respects, the code was portable, in that it could be transformed, by the compilers, to run efficiently on a variety of (vector) machines. Currently, we see programmers blocking their code by hand for each machine they have have to use. Kennedy (and others) hope to develop adequate techniques to allow programmers to write more portable code, trusting the compilers to find efficient blockings. Some researchers at IBM think that the RAM model used by most programmers for reasoning about their program's complexity is fatally flawed because it has only a single level of memory. They propose a couple of more sophisticated models that account for the memory hierarchy in various ways. The Uniform Memory Hierarchy Model of Computation Bowen Alpern, Larry Carter, Ephraim Feig, Ted Selker FOCS 90 (Foundations of Computer Science) Regarding the invention of blocking, there's an old paper Matrix Algebra Programs for the UNIVAC J. D. Rutledge and H. Rubenstein Wayne Conference on Automatic Computing Machinery and Applications March 1951 that discusses blocking various matrix routines for a memory hierarchy, presumably including tape (though I don't have a copy of the paper to be sure).