Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!imagen!atari!portal!cup.portal.com!bcase From: bcase@cup.portal.com Newsgroups: comp.arch Subject: Re: Register Windows (was Re: Japanese...) Message-ID: <9725@cup.portal.com> Date: 4 Oct 88 20:24:20 GMT References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> LAZY STORE/LOAD SHOULD BE USED. >you need only two *actual* register sets and more help only a very little. [SEE THE ORIGINAL FOR ADDITONAL VERBOSE CONTEXT] >one can pretend to have as many register sets as >one wants, even though the hardware need implement only two: by the time a >register set must be reused for another set, the set will have been saved >using lazy stores, hence, it will be ready and waiting. Likewise, when a set >is to be removed, the hardware begins lazy-reloading of the previous set, >hence no delay is seen. Compare this to the very-hard-to-hide delay of using >burst stores, with or without register windowing.... In my opinion, this is BS: This is tantamount to saying you can predict the sequence of calls and returns!! If my CPU starts lazily saving but the next thing it does is return, it has wasted all the memory cycles spent lazily saving. If my CPU lazily restores but the next thing it does is call, it has wasted all the memory cylces spent restoring. This is related to the question: "How many windows should be saved/restored on a call/ return?" This question was answered by the Berkeley people: exactly one. The reason it is one is that you can't predict the future. "But," you say, "the lazy cycles aren't wasted since they don't interfere with normal memory references." Sorry, I don't buy this because to insert memory references so that they don't interefere means that you are able to predict the pattern of memory references. Sure by looking ahead, and all that, but this is *not simple*. You must be able to predict conditional branches! Lastly, a register window scheme with multiple windows provides hysterisis *without any* memory references at all! The two-window-plus-laziness scheme doesn't. Just because the memory interface isn't completly saturated *doesn't* mean that the unused capacity can be used without penalty. There is an allocation penalty associated with arbitrating for any fixed resource; think of an intersection controlled by a stop light at rush hour. The resource simple can't be used 100% of the time: we have to switch the direction of traffic flow every now and then. This is the purpose behind bursting: once the direction of traffic (memory) flow has been established, speed through the intersection as fast as possible. Changing the direction of flow after every car (memory reference) is inefficient. I think queuing theory also comes into play here. True, if you could look ahead at the instruction stream, predict correcly all the conditional branches etc., then you could know exectly where to insert the lazy loads and stores. But I think this is unrealistic. You have to consider the realities of implementation! If my analysis has erred, please enlighten me!