Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!imagen!atari!portal!cup.portal.com!bcase From: bcase@cup.portal.com Newsgroups: comp.arch Subject: Re: Register Windows (was Re: Japanese...) Message-ID: <9837@cup.portal.com> Date: 7 Oct 88 21:19:41 GMT References: <58@zeno.MN.ORG> <91@zeno.MN.ORG> I just love writing replies to unidentified flamers from the portal system 8-). ??Er, I don't understand what you mean by unidentified; my login name was included in your article. Do you mean that my full name wasn't included? It is Brian Case... I am dismayed to discover that I am being judged because my postings originate from Portal! Anybody want to give me an account on a respectable UNIX system? ||||| >In article <9725@cup.portal.com> bcase@cup.portal.com writes: >>In my opinion, this is BS: This is tantamount to saying you can predict >>the sequence of calls and returns!! If my CPU starts lazily saving but >>the next thing it does is return, it has wasted all the memory cycles spent >>lazily saving. John left out my comment that I believe the lazy loads/stores interfere with normal memory traffic. I am probably wrong though.... >None of these schemes [the ones he mentioned in his reply to my >posting] imply knowing the instruction stream in advance. >I guess the point that (unnamed) missed was that (other unnamed) assumed >that the hardware was managing the lazy loads/stores, not the compiler. Which unnamed am I? :-) Anyway, I was assuming that hardware would be scheduling the lazy loads/stores. While the hardware at least has knowledge about what instructions are in its pipeline (safely assuming a pipelined implementation, I think), it still can't predict the future. I still maintain that a great deal of disruptive loading/storing can go on just to find out later that it was wasted. With more than two windows, hysterisis causes the register file to capture the working set of the register window stack. I guess the point that I was trying to make in my posting was that hardware can't know exactly when to insert them lazy loads/stores: if it inserts one that causes a cache miss (though this is probably unlikely; there are probably better cases to use as an example) and immediately after the insertion a conditonal branch is taken to a "real" load (one that the program wants to do), the "real" load will be delayed by the lazy load/store. But, you say, if the lazy load/ store had been done explicitly, it would be guaranteed to delay the progress of the program. Yes, but if the explicit lazy loads/stores are done as part of a burst, each one is much more efficient. I think something that might be able to end this discussion is the exposition of the hardware algorithm that would be used to implement lazy load/store. It can't be complicated, so it should be suitable for posting. If it is good and works, then it will be clear that I was wrong to attack the original suggestion that two winodows are sufficient. (Of course, there is a whole contingent who says that "no windows" is sufficient!) >Of course, these schemes produce more complicated silicon than burst >read/write schemes. In particular, anything with flags/register >introduces latency into the register read/write path which is the >critical path unless you have blown everything. As I have said before, I think that one of the various caches in the system (excluding the register file if you think of that as a cache), e.g., the TLB or instruction cache, will be the critical path. >Is it a win? simulate and see. Tell the world. Yes, this is the right thing to do. >By the way: > There is a paper: [from Berkeley] > Which shows that the MIPS R2000 (aside from running faster) achieves > fewer memory references (in almost all cases) than SPARC with all > levels of optimization and as many as 7 register windows. Well, this is news to me! I must blush and appologize for many of my past postings if these results are true for architectures like the 29K. I do hope they included the effects of high-bandwidth, sequential-access memories.