Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!purdue!decwrl!granite!jmd From: jmd@granite.dec.com (John Danskin) Newsgroups: comp.arch Subject: Re: Register Windows (was Re: Japanese...) Message-ID: <287@granite.dec.com> Date: 6 Oct 88 18:37:54 GMT References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> Reply-To: jmd@granite.UUCP (John Danskin) Organization: DEC Technology Development, Palo Alto, CA Lines: 78 I just love writing replies to unidentified flamers from the portal system 8-). In article <9725@cup.portal.com> bcase@cup.portal.com writes: >> LAZY STORE/LOAD SHOULD BE USED. >>you need only two *actual* register sets and more help only a very little. >[SEE THE ORIGINAL FOR ADDITONAL VERBOSE CONTEXT] >>one can pretend to have as many register sets as >>one wants, even though the hardware need implement only two: by the time a >>register set must be reused for another set, the set will have been saved >>using lazy stores, hence, it will be ready and waiting. Likewise, when a set >>is to be removed, the hardware begins lazy-reloading of the previous set, >>hence no delay is seen. Compare this to the very-hard-to-hide delay of using >>burst stores, with or without register windowing.... > >In my opinion, this is BS: This is tantamount to saying you can predict >the sequence of calls and returns!! If my CPU starts lazily saving but >the next thing it does is return, it has wasted all the memory cycles spent >lazily saving. [Followed by a bunch more stuff about how yu can't predict the instruction stream] There are a couple of ways people have proposed doing lazy saving: o Katenevis (or somebody working with him) proposed trying to save register windows during unused memory cycles. They did some analysis and concluded that there were not enough unused cycles. Part of the problem here might have been that they needed to save the whole window before any of it was usable. o When you start a new register window, flag all of the registers as 'clean'. Whenever you update one, flag it as dirty. Now, when it comes time to reuse a register window, whenever the processor tries to update a previously dirty register, copy it somewhere and try to save it during an otherwise unused memory cycle. On return, flag registers which have lost their values as 'empty' (or something). When the processor tries to refer to an empty register, the value is reloaded. This is really lazy. You only save/restore registers when you need to. None of these schemes imply knowing the instruction stream in advance. I guess the point that (unnamed) missed was that (other unnamed) assumed that the hardware was managing the lazy loads/stores, not the compiler. Of course, these schemes produce more complicated silicon than burst read/write schemes. In particular, anything with flags/register introduces latency into the register read/write path which is the critical path unless you have blown everything. Is it a win? simulate and see. Tell the world. By the way: There is a paper: "Register Windows Vs. General Registers: A Comparison of Memory Access Patterns" by Scott Morrison and Nancy Walker of UC Berkeley. Which shows that the MIPS R2000 (aside from running faster) achieves fewer memory references (in almost all cases) than SPARC with all levels of optimization and as many as 7 register windows. a) Does anyone know if/where (Earl?) this paper was published? (I got a copy from MIPS people, they love to give it away). b) Does anybody at SUN have an answer (tell us how they got it all wrong, register windows really DO save memory references). c) Anybody at AMD (Tim?) want to say something about how burst read/write makes the extra references OK? -- John Danskin | decwrl!jmd DEC Technology Development | (415) 853-6724 100 Hamilton Avenue | My comments are my own. Palo Alto, CA 94306 | I do not speak for DEC.