Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uunet!samsung!usc!ucsd!ucsdhub!hp-sdd!hplabs!hpda!hpcuhc!edwardm From: edwardm@hpcuhc.HP.COM (Edward McClanahan) Newsgroups: comp.sys.m88k Subject: Re: Register Allocation (was Re: Info about 88open & standards) Message-ID: <100050002@hpcuhc.HP.COM> Date: 27 Nov 89 19:14:50 GMT References: <1989Nov16.212149.9770@paris.ics.uci.edu> Organization: Hewlett Packard, Cupertino Lines: 44 Tom Horsley touches on an interesting side issue: > I can point out one obvious flaw in the idea that the called routine should > do the register saves. We have several benchmarks as well as several real > programs (as opposed to the typical benchmark :-) in which it is possible > to examine the code and see that the current conventions produce fantastic > code. This occurs (quite frequently, I might add) when the outer routine > contains loops, and the loops contain subroutine calls. Very often (because > 12 registers is really an awful lot of registers) the leaf routine does > not need to save any registers at all. (For instance, this is true of quite > a lot of the low-level str and mem routines in the C library). ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As I recall, the VAX has TWO calling conventions: 1 - CALLS and CALLG explicitly require an "entry mask" in the called procedure indicating which registers to push on the stack. This mask is 16 bits, one bit per register, although you never save certain registers (e.g. R0, PC, etc...). The microcode interpreting the CALLx instruction actually does the PUSHes for you. 2 - JSR and BSR simply push the return PC on the stack and jump to the called procedure. The callee must then "protect" any registers it uses. Both of these schemes implement the callee-saved model, but the JSR/BSR is faster for "low-level...routines". In HP's RISC architecture, we have both callee and caller saved registers (as is the case in the m88k standard). Still, for some "low-level stuff", we needed a faster way to call a function. We implemented "millicode" to fill this gap. The caller simply doesn't have to save any caller-saved registers it happens to be using at the time. Similarly, the optimizer doesn't have to halt optimization across the millicode call. Those "low-level str and mem routines" are implemented in millicode. Is there any provision for this calling convention in the m88k standard? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Edward McClanahan Hewlett Packard Company Mail Stop 47UE -or- edwardm%hpda@hplabs.hp.com 19447 Pruneridge Avenue Cupertino, CA 95014 Phone: (408)447-5651