Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!uwvax!astroatc!johnw From: johnw@astroatc.UUCP Newsgroups: comp.arch Subject: Re: RISC register windows Message-ID: <161@astroatc.UUCP> Date: Wed, 25-Feb-87 16:42:07 EST Article-I.D.: astroatc.161 Posted: Wed Feb 25 16:42:07 1987 Date-Received: Fri, 27-Feb-87 20:36:51 EST References: <1881@homxc.UUCP> <898@moscom.UUCP> <476@mntgfx.MENTOR.COM> Reply-To: johnw@astroatc.UUCP (John F. Wardale) Organization: Astronautics Technology Cntr, Madison, WI Lines: 101 Keywords: performace, cache Its clear to me from the questions, that many people don't understand RISC. RISC stands for Reduced Istruction Set Computer. Features common to RISC machines: + One load/one store insturction. (All others are reg-to-reg.) + Regular (read: easy to decode) instructions uaually a single (or limited set of) instruction format(s) + usually a fairly small set of insturctions Additional features of the Berkeley RISC machine: + Register Windows + delayed branchs + Reg 0 is always 0 + a one-bit field in the instuction to set codition codes (This was a great feature, in my opionion) + 3 operands/instruction The Purpose of Register Windows was to remove the need for a cache. As a student project, a real cache design was deemed too hard. In practice, I think the Sun/3's and similar machines are proving that windows are not the "only-true-way" to get good performace. The division of registers: ------------------------- Reg 0 always 0 Reg 1-9 Global (static) registers Reg 10-16 (6) input parameters Reg 17-27 (10) locals Reg 28-31 (6) output parameters This 6/10/6 division of register is arbitrary, but is kept fixed to insure maximum speed. To make it dynamic, would require a size-register (effectively nargs() value) probably *IN* the window (otherwize it would not be possible to unwind the call-stack). The division of register (6/10/6) was selected carfully, after weeks of study on real code (see the RISC article in Computer, a few years back) - - - - - - - - - - - - - - As for dealing with register window over-flow, under-flow, the Berkeley paper (as I recall) said that it's occurence was not frequent enough to sweat about. As for "dribbling" registers in or out (at the ends of the window stack -- a circular queue with a dead-zone to represent the stack), would be fine for a small machine. Thought experiment: Lets build a *FAST* machine (I-cache, and DATA-cache) with windows, and a memory system that's wide enough to fill a full window of registers in one (memory access + split-transfer) time unit. Now what happens when you want to run a new process??? The state you must save is not just your current window, but your *WHOLE* stack of windows! Windows are great for running benchmarks, but are not so good for doing context switching. Anyone know how Pyramid solved this??? How many users (vi, csh, compiles, etc.) does it take to bring a Pyramid to its knees?? > >From: eppstein@tom.columbia.edu (David Eppstein) > Subject: register window machine questions > > (1) Has anyone tried making the window block size be just one register, > i.e. the window can have an arbitrary alignment in relation to memory? > I would expect this to be more efficient in terms of registers used (and > therefore also memory), but register access time might suffer. I don't understand this question! The instruction has bit fields to specify the register, so you can't go over the max-register number, and if you don't use all you're registers, you're wasing resources anyway! I would venture to guess that FAST machines spend less that 10% of there time running within 10% of their PEAK performance!! > (2) dirty regs and (3) staic/window --> see above > David Eppstein, eppstein@cs.columbia.edu, Columbia U. Computer Science Dept. > --------------------------------------------- > > >From: chuck@amdahl.UUCP (Charles Simmons) > Subject: Re: subroutine frequency > > Are there reasons I am missing that make fixed sized windows extremely > advantageous? Just that fixed it simple (fast), easy to do, and doesn't require alot of state to keep track of. If you can show how dynamic windows are better when you assume that memory *BANDWITH* is high enough, but its just a *LATENCY* problem we have to deal with, then I'll vote with you. NOTES: bandwith refers to thruput, or a long-term performace average. latency refers to responce, or the short-term time it takes to get what you ask for. It's frequently easy to trade bandwith and latency, but its hard to improve *BOTH* at the same time! John W - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: John F. Wardale UUCP: ... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw arpa: astroatc!johnw@rsch.wisc.edu snail: 5800 Cottage Gr. Rd. ;;; Madison WI 53716 audio: 608-221-9001 eXt 110 To err is human, to really foul up world news requires the net!