Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!iuvax!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Register Windows (was Re: Japanese...) Summary: Burst register<=>memory xfer considered harmfull Message-ID: <9361@pur-ee.UUCP> Date: 3 Oct 88 21:06:22 GMT References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> Organization: Purdue University Engineering Computer Network Lines: 85 In article , andrew@jung.harlqn.uucp (Andrew Watson) writes: > In article <91@zeno.MN.ORG> gene@zeno.MN.ORG (Gene H. Olson) writes: > > [stuff as to ignoring register windows for SPARC...] > > SPARC offers you the option of register windows. If they don't > help in some situations (they seem to work well in others) there > is no reason you need use them. Since only two of them are > required, there is no serious penalty if they are unused by an > implementation. Its clearly a win-win situation. > > Just a small point from a compiler-writer who's been attempting to write a code > generator for the SPARC that *does* ignore register windows - it's possible, > but the architecture really doesn't support it. > > [complaint about having no burst register push/pop instructions] At HICSS'21, in January 1988, I chaired a discussion on RISC architecture, which quickly became a discussion of register windowing concepts. There were several key points which surfaced in the discussion; the following is just a bit about what you really want instead of burst regsiter save/restore instructions.... LAZY STORE/LOAD SHOULD BE USED. How Many Windows Do You Need? For some reason, it has become popular to assume that the more registers you have, the better off you are. However, consider window management using lazy operations: you need only two *actual* register sets and more help only a very little. Suppose we start executing in window A. Suddenly, we find that we must execute a subroutine call, making the current window B (yet keeping A in its register set). In most machines, not every memory reference cycle is actually used to refer to memory... let's suppose that F is the fraction of time during which the memory reference time slot is empty. Suppose also that the number of instructions between creating window A and switching to window B is I, and the number of instructions executed between creating B and reverting to A is J. Further suppose that there are R registers used in a typical register window. If R*(1/F) > I and R*(1/F) > J then by having the hardware automatically save and restore "in-use" registers lazily (whenever it sees free memory cycles), one can pretend to have as many register sets as one wants, even though the hardware need implement only two: by the time a register set must be reused for another set, the set will have been saved using lazy stores, hence, it will be ready and waiting. Likewise, when a set is to be removed, the hardware begins lazy-reloading of the previous set, hence no delay is seen. Compare this to the very-hard-to-hide delay of using burst stores, with or without register windowing.... For three conceptual sets, A, B, and C, the sequence is like: A begins in set 0 A executes some stuff B begins in set 1, A remains in 0 B executes some stuff, while A is lazily stored C begins in set that had been A (set 0), B remains in 1 C executes some stuff, while B is lazily stored C terminates, B is still available in 1 B continues to execute (in 1), while A is lazily reloaded (in 0) B terminates, A is now available again (set 0) A continues to execute in set 0 A terminates It is easy to see that having more sets simply makes the hardware more tolerant to variations in R, F, I, and J (since these values are effectively averaged over all the actual register sets). Notice that a lazy-store/lazy-reload, register-name-addressible top-of-stack cache is virtually equivalent to the lazy window scheme, but probably is a bit more general. ...There is plenty more to say about this, if people are interested. (Of course, when it comes to register window use, I'm biased: I'd rather use the compiler-driven register/cache/CReg management that Chi and I have developed; however, I'm perfectly happy to start a discussion which will lead everyone else to the same conclusion ;-). -Prof. Hank Dietz __ /| _ | | __ / | Compiler-oriented / |--| | | | | Architecture / | | |__| |_/ Researcher from \__ | | | \ | Purdue \ | \ \ \ \ hankd@ee.ecn.purdue.edu