Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!iuvax!pur-ee!hankd
From: hankd@pur-ee.UUCP (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: Register Windows (was Re: Japanese...)
Summary: Why lazy store/reload is likely to be "free"
Message-ID: <9410@pur-ee.UUCP>
Date: 6 Oct 88 22:53:48 GMT
References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> <ANDREW.88Sep28160417@jung.ha <9725@cup.portal.com>
Organization: Purdue University Engineering Computer Network
Lines: 81

Pertaining to my posting about lazy store/reload of register frames:

In article <9725@cup.portal.com>, bcase@cup.portal.com writes:
> [Flaming as to lazy ops not being free and some obscure concept about
>  needing to predict the future...]
> The reason it is one is that you can't predict the future.
> [More flaming about memory intereference...]
> ...  You must be able to predict conditional branches!
> 
> Lastly, a register window scheme with multiple windows provides hysterisis
> *without any* memory references at all!  The two-window-plus-laziness
> scheme doesn't.
>
> ...  True, if you could look ahead at
> the instruction stream, predict correcly all the conditional branches
> etc., then you could know exectly where to insert the lazy loads and
> stores.  But I think this is unrealistic.

You don't seem to know what lazy operations are.  A lazy operation is NOT a
simple "delayed" load or store, and it isn't triggered by an opcode being
executed; it is an operation which is automatically triggered either when
certain conditions exist which favor its efficient completion or when its
result is required --  one does NOT insert lazy operations in anything.

Let's say you have 8 non-lazy windows (and one heck of a lot of valuable die
space consumed by them).  What do you do when the 9th nested call is made?
The 10th?  You do a sit-and-wait-for-it burst store, that's what...  would
you really describe that as being "*without any* memory references at all!"

With lazy store/reload built-in, by the time you make a call nested deeper
than you have windows, it is very likely that you've got at least one
register set saved so that you can immediately reuse it -- without waiting
for any memory references.  When a procedure returns, the frame which was
saved begins to lazily reload its values into the set it was flushed from.
Now, you can argue that there might not be enough time between calls or
between returns to lazily store or reload a set, but that's unlikely
because:

Calls:		The lazy stores only have to store registers which are live
		and dirty, i.e., whose value will be referenced after return
		from the call and also is not the same as a value stored
		somewhere in memory (e.g., a variable) by the programmer's
		code.  In most RISC processors, the only instructions able
		to make a register dirty are register = register op
		register...  which all have a free memory reference cycle!
		In the worst case, you'd lag behind by just one register
		store, which sure beats doing a non-lazy burst every so
		often. QED.

Returns:	The obvious worst case is if the instruction immediately
		after each call is a return.  However, if you do return
		immediately, then those registers were not live after the
		call (see above):  you would not have saved 'em in the first
		place, so they'd be restored in zero time.  Since live
		values are usually kept in registers primarily to be
		operated upon, a similar argument to that given for calls
		can be applied...  lazy reloading might not keep-up with
		returns, but it shouldn't be far behind.

How unlikely?  Specify an instruction set and timings and find out.  That's
one of the reasons I posted this -- to inspire a bit of thought and research.
However, one has very good reason to believe it will work quite well, and if
it works even nearly as well as non-lazy windowing but requires far fewer
windows, just look at all that valuable die space you just bought!

As for predicting the future, it has very little to do lazy operation, but
moreso it's a misnomer.  Statically examining code (e.g., at compile time)
there is no past or future, you know everything probabilistically and that's
usually good enough -- I agree that hardware can't predict the future, so you
ought not try to make it do that...  hardware is good at other things, like
telling you exactly which way THIS branch goes or which memory module THIS
pointer references.  The static/dynamic tradeoffs are what my research group,
CARP, is all about.

     __         /|
  _ |  |  __   / |  Compiler-oriented
 /  |--| |  | |  |  Architecture
/   |  | |__| |_/   Researcher from
\__ |  | | \  |     Purdue
    \    |  \  \
	 \      \   hankd@ee.ecn.purdue.edu