Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!imagen!atari!portal!cup.portal.com!bcase
From: bcase@cup.portal.com
Newsgroups: comp.arch
Subject: Re: Register Windows (was Re: Japanese...)
Message-ID: <9725@cup.portal.com>
Date: 4 Oct 88 20:24:20 GMT
References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> <ANDREW.88Sep28160417@jung.ha
Organization: The Portal System (TM)
Lines: 49
XPortal-User-Id: 1.1001.5156

>	LAZY STORE/LOAD SHOULD BE USED.
>you need only two *actual* register sets and more help only a very little.
[SEE THE ORIGINAL FOR ADDITONAL VERBOSE CONTEXT]
>one can pretend to have as many register sets as
>one wants, even though the hardware need implement only two:  by the time a
>register set must be reused for another set, the set will have been saved
>using lazy stores, hence, it will be ready and waiting.  Likewise, when a set
>is to be removed, the hardware begins lazy-reloading of the previous set,
>hence no delay is seen.  Compare this to the very-hard-to-hide delay of using
>burst stores, with or without register windowing....

In my opinion, this is BS:  This is tantamount to saying you can predict
the sequence of calls and returns!!  If my CPU starts lazily saving but
the next thing it does is return, it has wasted all the memory cycles spent
lazily saving.  If my CPU lazily restores but the next thing it does is
call, it has wasted all the memory cylces spent restoring.  This is related
to the question:  "How many windows should be saved/restored on a call/
return?"  This question was answered by the Berkeley people:  exactly one.
The reason it is one is that you can't predict the future.

"But," you say, "the lazy cycles aren't wasted since they don't interfere
with normal memory references."  Sorry, I don't buy this because to insert
memory references so that they don't interefere means that you are able
to predict the pattern of memory references.  Sure by looking ahead, and
all that, but this is *not simple*.  You must be able to predict conditional
branches!

Lastly, a register window scheme with multiple windows provides hysterisis
*without any* memory references at all!  The two-window-plus-laziness
scheme doesn't.

Just because the memory interface isn't completly saturated *doesn't*
mean that the unused capacity can be used without penalty.  There is
an allocation penalty associated with arbitrating for any fixed 
resource; think of an intersection controlled by a stop light at rush
hour.  The resource simple can't be used 100% of the time:  we have
to switch the direction of traffic flow every now and then.  This is
the purpose behind bursting:  once the direction of traffic (memory)
flow has been established, speed through the intersection as fast as
possible. Changing the direction of flow after every car (memory
reference) is inefficient.  I think queuing theory also comes into
play here.  True, if you could look ahead at
the instruction stream, predict correcly all the conditional branches
etc., then you could know exectly where to insert the lazy loads and
stores.  But I think this is unrealistic.

You have to consider the realities of implementation!

If my analysis has erred, please enlighten me!