Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!elroy!cit-vax!ucla-cs!oahu!marc From: marc@oahu.cs.ucla.edu (Marc Tremblay) Newsgroups: comp.arch Subject: Re: 80960 Register windows Message-ID: <11464@shemp.CS.UCLA.EDU> Date: 21 Apr 88 03:16:33 GMT References: <3358@omepd> <29454@linus.UUCP> <3392@omepd> <385@bacchus.DEC.COM> Sender: news@CS.UCLA.EDU Reply-To: marc@oahu.UUCP (Marc Tremblay) Organization: UCLA Computer Science Department Lines: 48 Keywords: 80960, RISC, embedded control In article <385@bacchus.DEC.COM> alverson@decwrl.UUCP (Robert Alverson) writes: >After hearing about the 80960 for the last few days, I still have a few >questions: > >1. How are parameters passed into procedures? In Berkeley's RISC, this > was accomplished with overlapping windows. What I have read seems to > imply that the 80960 windows do not overlap. How about procedure > return values? Indeed the windows do not overlap. There are a few ways to get around this problem, eventhough it is not as efficient: 1) Parameters can be passed through global registers. Obviously problems occur when the depth of subroutine calls gets large (for example recursive calls). 2) Parameters can be pushed unto the stack (in memory) like in a conventional register scheme. 3) For a long list of parameters, a pointer to an argument list can be placed in a global register. 4) Finally, the 80960 provides an instruction (flushreg) which writes the contents of all the local register sets (in the register cache) to their associated stack frames in memory. This method could be used to pass parameters through-local- registers-of-the-caller. >2. Just how much extra delay does register windows cost? There may be extra > decoding, or extra loading on lines, or extra time during call & ret (to > switch register sets). Nothing is free. > In the Berkeley-like window schemes, the larger the number of windows, the longer the READ delay, this is due to a longer data bus which increases the load capacitance. Intel partly solves this problem by using a register cache. I do not have access to their layouts but I doubt that the internal data bus goes through the register cache. In this way they can increase the number of local register sets that can be saved on chip, without *directly* increasing the data bus. One important "indirect delay" introduced by adding more sets is related to the saving of those sets (done four words at the time!), having a larger register cache will increase the saving time. For a few more sets it may not even be in the critical path though. I also wrote a paper describing another method, I will send you the reference if you request if via e-mail. Marc Tremblay marc@CS.UCLA.EDU ...!(ihnp4,ucbvax)!ucla-cs!marc Computer Science Department, UCLA