Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!elroy.jpl.nasa.gov!swrinde!cs.utexas.edu!rice!ariel.rice.edu!preston
From: preston@ariel.rice.edu (Preston Briggs)
Newsgroups: comp.arch
Subject: Memory hierarchies
Message-ID: <1991May7.152224.3146@rice.edu>
Date: 7 May 91 15:22:24 GMT
References: <1991May2.162909.9165@news.arc.nasa.gov> <819@cadlab.sublink.ORG> <1991May7.061500.7485@marlin.jcu.edu.au>
Sender: news@rice.edu (News)
Organization: Rice University, Houston
Lines: 66

>lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes:

>I have only limited experience with the new, fast-only-in-cache, machines,
>but I have to say that the code you need to get optimum performance is
>even more non-intuitive than that for the older vector architecture machines.
>Even worse, code which was previously optimal for vector machines, and which
>was OK on a wide variety of other machines, is now pessimal for these machines

The reality of big systems is that they are implemented with a
memory hierarchy.  Typically

	registers
	cache
	tlb
	ram
	disk

A Cray, running vector stuff, it might look more like

	ram
	disk

but the hierarchy still exists.  A fair amount of the money spent
on a super is dedicated toward flattening the hierarchy.

For best results, you (or your compiler) should be concious
of the implementation (not just the architecture) of the target
machine.  Everyone knows about stripping for cache.  Well,
you can also block for registers, tlb, and ram.

Ken Kennedy would like to see programmers coding in a "blockable" style,
with compilers doing the actual blocking.  He makes an analogy
with vectorization.  When vectorizing compilers became available,
programmers learned to write somewhat stylized loops that they.
expected the vectorizer to recognize and handle efficiently;
they learned to write vectorizable code.  In many respects,
the code was portable, in that it could be transformed, by the
compilers, to run efficiently on a variety of (vector) machines.
Currently, we see programmers blocking their code by hand for
each machine they have have to use.  Kennedy (and others) hope
to develop adequate techniques to allow programmers to write
more portable code, trusting the compilers to find efficient
blockings.

Some researchers at IBM think that the RAM model used by most
programmers for reasoning about their program's complexity is
fatally flawed because it has only a single level of memory.
They propose a couple of more sophisticated models that account
for the memory hierarchy in various ways.

	The Uniform Memory Hierarchy Model of Computation
	Bowen Alpern, Larry Carter, Ephraim Feig, Ted Selker
	FOCS 90 (Foundations of Computer Science)


Regarding the invention of blocking,
there's an old paper

	Matrix Algebra Programs for the UNIVAC
	J. D. Rutledge and H. Rubenstein
	Wayne Conference on Automatic Computing Machinery and Applications
	March 1951

that discusses blocking various matrix routines for a memory hierarchy,
presumably including tape (though I don't have a copy of the paper
to be sure).