Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!elroy!cit-vax!ucla-cs!oahu!marc
From: marc@oahu.cs.ucla.edu (Marc Tremblay)
Newsgroups: comp.arch
Subject: Re: 80960 Register windows
Message-ID: <11464@shemp.CS.UCLA.EDU>
Date: 21 Apr 88 03:16:33 GMT
References: <3358@omepd> <29454@linus.UUCP> <3392@omepd> <385@bacchus.DEC.COM>
Sender: news@CS.UCLA.EDU
Reply-To: marc@oahu.UUCP (Marc Tremblay)
Organization: UCLA Computer Science Department
Lines: 48
Keywords: 80960, RISC, embedded control

In article <385@bacchus.DEC.COM> alverson@decwrl.UUCP (Robert Alverson) writes:
>After hearing about the 80960 for the last few days, I still have a few
>questions:
>
>1. How are parameters passed into procedures?  In Berkeley's RISC, this
>   was accomplished with overlapping windows.  What I have read seems to
>   imply that the 80960 windows do not overlap.  How about procedure
>   return values?

Indeed the windows do not overlap. There are a few ways to get around
this problem, eventhough it is not as efficient:

	1) Parameters can be passed through global registers. Obviously problems
 	   occur when the depth of subroutine calls gets large (for example
	   recursive calls).

	2) Parameters can be pushed unto the stack (in memory) like in a
	   conventional register scheme.
	
	3) For a long list of parameters, a pointer to an argument list can
	   be placed in a global register.

	4) Finally, the 80960 provides an instruction (flushreg) which
	   writes the contents of all the local register sets (in the 
	   register cache) to their associated stack frames in memory.
	   This method could be used to pass parameters through-local-
	   registers-of-the-caller.

>2. Just how much extra delay does register windows cost?  There may be extra
>   decoding, or extra loading on lines, or extra time during call & ret (to
>   switch register sets).  Nothing is free.
>

In the Berkeley-like window schemes, the larger the number of windows, the
longer the READ delay, this is due to a longer data bus which increases the
load capacitance. Intel partly solves this problem by using a register cache.
I do not have access to their layouts but I doubt that the internal data bus
goes through the register cache. In this way they can increase the number of
local register sets that can be saved on chip, without *directly* increasing
the data bus. One important "indirect delay" introduced by adding more sets
is related to the saving of those sets (done four words at the time!), having
a larger register cache will increase the saving time. For a few more sets
it may not even be in the critical path though. 
I also wrote a paper describing another method, I will send you the reference
if you request if via e-mail.
					Marc Tremblay
					marc@CS.UCLA.EDU
					...!(ihnp4,ucbvax)!ucla-cs!marc
					Computer Science Department, UCLA