Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!oakhill!phillip From: phillip@oakhill.UUCP (Mike Phillip) Newsgroups: comp.sys.m88k Subject: Re: Register Allocation (was Re: Info about 88open & standards) Summary: (long) Message-ID: <2647@bushwood.oakhill.UUCP> Date: 20 Nov 89 19:10:42 GMT References: <1948@psueea.UUCP> <1989Nov14.175806.23483@paris.ics.uci.edu> <5063@tekcrl.LABS.TEK.COM> <2631@yogi.oakhill.UUCP> <1989Nov16.212149.9770@paris.ics.uci.edu> Reply-To: oakhill!bushwood!phillip@cs.utexas.edu (Mike Phillip) Organization: Motorola Inc., Austin, Texas Lines: 155 In article <1989Nov16.212149.9770@paris.ics.uci.edu> Ron Guilmette writes: > >My original question was "Why should the caller save *any* of his live >registers when he has absolutely no knowledge of whether or not *any* of >them will be used (i.e. clobbered) in the called routine?" > >I don't think that I have yet seen a good answer to this question. > I think that the issues of parameter passing and register allocation have become somewhat confused in this discussion thus far, as have the relative merits of callee vs caller saved registers. To make matters even more confusing, the "meta"-issue of compiler technology has been introduced without clearly distinguishing between local, global and inter-procedural register allocation (Note that these are THREE different cases). First of all, the parameter passing registers (r2-r9) are really irrelevant to the caller vs. callee discussion, since they are effectively treated as "caller save" in ALL cases. It just so happens that OCS designates that these registers may be used to hold subroutine parameters during calling sequences. The callee is then free to use these registers as it sees fit without preserving the values/variables/temporaries (choose your favorite term) associated with them. Thus, combining these registers with registers r10-r13 provides a total of 12 "caller save" registers. The OCS also provided 12 "callee save" registers, r14-r25. The main point of debate seems to be the rationale behind the chosen division between caller and callee registers. [... deletion of some discussion ...] > >What if there are *never* any dead registers? Consider this. Given the >typical program (which uses at least 30 or more "values" throughout >its execution lifetime) it may be possible to use *all* available registers >for at least *some* productive purpose at *all* points throughout a program. > >Even without having global data-flow information, at the very least you could >decide to keep several simple global variables in registers, and the odds >are very good that you would get at least some performance increase as a >result of doing this. > >On the other hand, if you *do* have good global data flow analysis, then >there is no doubt whatsoever that you should be able to use all registers >for "live" values at all points throughout a program. > >Thus, you should always have registers filled with live values (in any case). > >Thus, you should always have a "pure" caller-saves convention. QED > This is where things get tricky... The key point to recognize is that regardless of compiler technology, there will likely always need to be a distinction between caller and callee saved registers. When global (but not inter-procedural) register allocation is performed, it is not possible to determine the register usage of called subroutines at the point of the call (i.e. caller cannot see callee). If all registers were designated as being in "caller save" mode, code would need to be inserted by the compiler to "save" the contents of all registers whose associated variables have a live range that spans the subroutine call EACH TIME such a call was made. (Note that this does not necessarily imply that ALL registers need to be saved... some registers may be "dead" at the point of the call). In cases where the callee needs only a few registers, such a calling convention would likely result in unnecessary load and store instructions which use up valuable instruction issuing and memory bandwidth. At the other extreme, if all registers were used in "callee save" mode (with the exception of the parameter passing registers), the callee would generate load and store instructions for every register that it uses, regardless of whether such overhead is warranted by the "live" use of the same register in the caller. Thus, some division is needed between caller and callee saved registers. Why did OCS choose 12/12 ? I wasn't involved in the decision making process, but the issue only becomes significant when the following criteria are met: 1) The callee requires more than 12 registers to perform adequate optimizations. 2) The additional "callee save" registers used actually did NOT need to be saved and restored because their live range did not span the subroutine call in the caller. How often does this occur? Empirically speaking, beats me. (I don't have any such data at my disposal at the moment). But your above argument that good overall optimization in a compiler will result in most registers holding live values across calls actually MINIMIZES the effect of having "too many" caller or callee saved registers. In fact, if the register being used in a callee needs to be saved SOMEWHERE (i.e. the caller has a live value in it across the call), it can be argued that "callee" saved registers are preferable, since the callee can insert the saves and restores only in those flow paths where the register contents are clobbered. In such cases, "caller save" conventions can result in unnecessary overhead. Now onward to inter-procedural analysis... (A good reference for this topic is "Minimizing Register Usage Penalty at Procedure Calls" by Fred Chow from SIGPLAN Notices, July 88) As described in the above paper, "callee save" registers can effectively be treated as "caller save" registers by performing a depth-first traversal of the program call graph. (i.e. by the time the caller is analyzed, it has all relevant register usage info about the callee). The callee can then defer saving and restoring to the caller, making the register appear as though it was operating in "caller save" mode. There are limitations to this approach, however. These limitations occur when the compiler does not have complete register usage information for a particular subroutine (libraries are a commonly cited example). Other examples include: 1) recursive calls 2) calls made through subroutine pointers 3) separate compilation In such cases, even "globally-optimizing, inter-procedural analyzing" compilers need to resort to a default convention, and the trade-offs of caller vs. callee once again become relevant. Thus, OCS does not prevent optimizing compilers from taking advantage of inter-procedural register allocation, but provides the "default" for those cases when such optimizations cannot be made. >>The current scheme is a fairly efficient compromise between the two, that will >>do in the absence of global register allocation. > >Ah, ha! So if I understand you correctly, you are saying that this parameter >passing convention was intentionally designed to be a "compromise" which >would not make stupid compilers look too bad (even at the expense of >decreasing performance in the presence of a good modern compiler with >data-flow analysis). Is that correct? > Yes, the inclusion of both caller and callee save registers is a compromise... always has been, and probably always will... I disagree, however, that the compromise is in place to mask compiler deficiencies or that it hinders the development of state-of-the-art optimizing compilers. (And yes, I know, the Moto compilers can use improvement :^) As pointed out above, the distinction is needed even when inter-procedural analysis is used for register allocation. When such compiler technology is implemented, the relative distribution of caller vs. callee registers becomes LESS critical, since the compiler can essentially "convert" callee-saved registers into caller-saved registers by "pushing" them up in the call graph. Geez, I figured most of the register allocation debates would involve r26-r29, which are the so-called "linker registers". ;^) Mike Phillip Motorola 88000 Development E-mail: oakhill!bushwood!phillip@cs.utexas.edu or cs.utexas.edu!oakhill!bushwood!phillip