Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!iuvax!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Re: Register Windows (was Re: Japanese...) Summary: Why lazy store/reload is likely to be "free" Message-ID: <9410@pur-ee.UUCP> Date: 6 Oct 88 22:53:48 GMT References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> Organization: Purdue University Engineering Computer Network Lines: 81 Pertaining to my posting about lazy store/reload of register frames: In article <9725@cup.portal.com>, bcase@cup.portal.com writes: > [Flaming as to lazy ops not being free and some obscure concept about > needing to predict the future...] > The reason it is one is that you can't predict the future. > [More flaming about memory intereference...] > ... You must be able to predict conditional branches! > > Lastly, a register window scheme with multiple windows provides hysterisis > *without any* memory references at all! The two-window-plus-laziness > scheme doesn't. > > ... True, if you could look ahead at > the instruction stream, predict correcly all the conditional branches > etc., then you could know exectly where to insert the lazy loads and > stores. But I think this is unrealistic. You don't seem to know what lazy operations are. A lazy operation is NOT a simple "delayed" load or store, and it isn't triggered by an opcode being executed; it is an operation which is automatically triggered either when certain conditions exist which favor its efficient completion or when its result is required -- one does NOT insert lazy operations in anything. Let's say you have 8 non-lazy windows (and one heck of a lot of valuable die space consumed by them). What do you do when the 9th nested call is made? The 10th? You do a sit-and-wait-for-it burst store, that's what... would you really describe that as being "*without any* memory references at all!" With lazy store/reload built-in, by the time you make a call nested deeper than you have windows, it is very likely that you've got at least one register set saved so that you can immediately reuse it -- without waiting for any memory references. When a procedure returns, the frame which was saved begins to lazily reload its values into the set it was flushed from. Now, you can argue that there might not be enough time between calls or between returns to lazily store or reload a set, but that's unlikely because: Calls: The lazy stores only have to store registers which are live and dirty, i.e., whose value will be referenced after return from the call and also is not the same as a value stored somewhere in memory (e.g., a variable) by the programmer's code. In most RISC processors, the only instructions able to make a register dirty are register = register op register... which all have a free memory reference cycle! In the worst case, you'd lag behind by just one register store, which sure beats doing a non-lazy burst every so often. QED. Returns: The obvious worst case is if the instruction immediately after each call is a return. However, if you do return immediately, then those registers were not live after the call (see above): you would not have saved 'em in the first place, so they'd be restored in zero time. Since live values are usually kept in registers primarily to be operated upon, a similar argument to that given for calls can be applied... lazy reloading might not keep-up with returns, but it shouldn't be far behind. How unlikely? Specify an instruction set and timings and find out. That's one of the reasons I posted this -- to inspire a bit of thought and research. However, one has very good reason to believe it will work quite well, and if it works even nearly as well as non-lazy windowing but requires far fewer windows, just look at all that valuable die space you just bought! As for predicting the future, it has very little to do lazy operation, but moreso it's a misnomer. Statically examining code (e.g., at compile time) there is no past or future, you know everything probabilistically and that's usually good enough -- I agree that hardware can't predict the future, so you ought not try to make it do that... hardware is good at other things, like telling you exactly which way THIS branch goes or which memory module THIS pointer references. The static/dynamic tradeoffs are what my research group, CARP, is all about. __ /| _ | | __ / | Compiler-oriented / |--| | | | | Architecture / | | |__| |_/ Researcher from \__ | | | \ | Purdue \ | \ \ \ \ hankd@ee.ecn.purdue.edu