Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!mailrus!tut.cis.ohio-state.edu!rutgers!ucla-cs!marc From: marc@oahu.cs.ucla.edu (Marc Tremblay) Newsgroups: comp.arch Subject: Re: Longer load/store because of register windows Message-ID: <17268@shemp.CS.UCLA.EDU> Date: 27 Oct 88 17:42:16 GMT References: <156@gloom.UUCP> <310@lynx.zyx.SE> <332@pvab.UUCP> <15964@agate.BERKELEY.EDU> <23367@amdcad.AMD.COM> <16003@agate.BERKELEY.EDU> <469@oracle.UUCP> <7041@winchester.mips.COM> Sender: news@CS.UCLA.EDU Reply-To: marc@cs.ucla.edu (Marc Tremblay) Organization: UCLA Computer Science Department Lines: 42 >In article <469@oracle.UUCP> csimmons@oracle.UUCP (Charles Simmons) writes: > >If I remember the arguments from MIPS correctly (want to help me out >John?), there's a stronger objection to multiple-window-register-files. >I think it's something to the effect that register-windows cause the >load/store access time to be slower. Having a multiple-window register file, or more precisely, having many registers, slows down the processor cycle. Even with an independent port for the load/store, the operation is still based on the basic processor cycle. With a longer cycle the load/store accesses become slower. There are two reasons: 1) for a large register file, let's say 128 registers, the decoding of the registers addresses is longer (more bits to decode, even if you use partial decoding there is still a penalty), 2) the data bus is longer because it has to go over so many registers. A longer data bus implies larger capacitance and longer discharge time, thus longer processor cycle. Usually the access to the register file, either on a register read/write or on a load/store, is part of the critical path. You can play some tricks to get around those drawbacks, for example the Am29000 uses overlapping to avoid the penalty caused by the decoding. Even though the hardware is quite expensive (3 large decoders, 3 small adders, and some multiplexers), it is a gain. The Intel 80960 uses a cache for local register sets. I haven't seen the layout :-), but it seems like the sets are separated in a way that the data bus is not lengthened. Finally you can organize the layout in such a way that the current register set is always at the same place. Everytime that there is a change of window, you need to shift out the current window to a back up window, an shift in the new window into the current window, this whole operation can be done in ONE cycle for register files of a reasonable sizes (we've done it for a register file of 128 registers). This method makes the length of the data bus independent of the number of windows. So the question is: Is it clever to invest in a large register file with windows or is it better to use the silicon for other circuitry? The answer depends on how good your compiler people are! Marc Tremblay marc@CS.UCLA.EDU ...!(ihnp4,ucbvax)!ucla-cs!marc Computer Science Department, UCLA