Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!lll-lcc!well!msudoc!crlt!michael From: michael@crlt.UUCP Newsgroups: comp.arch Subject: Re: subroutine frequency Message-ID: <647@crlt.UUCP> Date: Fri, 20-Feb-87 15:41:28 EST Article-I.D.: crlt.647 Posted: Fri Feb 20 15:41:28 1987 Date-Received: Sun, 22-Feb-87 10:44:36 EST References: <1881@homxc.UUCP> <898@moscom.UUCP> Organization: McClary Associates, Ann Arbor MI Lines: 101 Keywords: register stack frame variable Summary: Linkage "b" slower than "c" if compiler not smart. [Yum yum!] In article <898@moscom.UUCP>, jgp@moscom.UUCP (Jim Prescott) writes: > > >do compilers take the time to keep track of which registers have > >been used and only save the 'dirty' ones or do most call and > >return mechanisms save the entire register set on the stack? > > It depends on the architecture and the compiler, the three easy ways > to do it are: > a) save all registers > b) have the caller save only the registers it is using > c) have the callee save only the registers it will use > pdp-11's use "a" since they only have 3 register variables anyway. Most > 68k compilers use "c" since you get about 12 register variables. I don't > know of anyone who uses "b" but it should be about as efficient as "c". In code generated by a brute-force compiler, "b" wastes a lot of CPU, by saving registers that will never be trashed. Suppose you're using, say, five of them, and make calls to subroutines that will only use one. Method "c" does one-fifth as much register save/restoration as method "b". The only place where method "c" saves an unused variable that method "b" doesn't is near the top level of your program, where a given register might not yet contain anything that must be preserved (unless the author of the O.S. was a total idiot). Here method "c" will waste time saving and restoring junk. But the top level code would normally be executed much less than the lower level stuff. On the other hand, method "b" could gain a global advantage when subroutines with few register variables make many calls to subroutines with many, which don't in turn make calls to another generation of children (or at least not while their registers need preservation). In cases like this, method "b" has saved the register once, while "c" would save it many times. Method "b" has other advantages as well: - It makes the caller, not the callee, responsible for the integrity of its own environment. Thus, if a hand-coded routine makes an error in register preservation, it will break itself (and finding the error will be easy), not previously-debugged portions of the calling routine. - It offers the opportunity to save CPU by omitting the storage and restoration of register variables that no longer contain anything of value (and a smart enough compiler might be able to determine this). In article <540@sei.cmu.edu.UUCP> firth@sei.cmu.edu.UUCP (Robert Firth) writes: >In article <898@moscom.UUCP> jgp@moscom.UUCP (Jim Prescott) writes: >> >>The method used has a large effect on whether setjmp/longjmp can put the >>correct values back into register variables (SYSVID says they may be >>unpredictable :-(. > >The codegenerators I wrote for the PDP-11 and VAX-11 use method (b). The >main reason for this was precisely the longjump problem: if local frames >store non-local state, then that state can be restored only by the very >slow process of unwinding the stack. [] I seem to be missing something here. Why can't setjmp save the entire register set, plus as much of the stack as would be pushed by the caller as it calls, in "env"? All "longjmp" needs to do is restore the state of the caller as of the "setjmp" call, and it is allowed, and even encouraged, to know what kind of code its particular compiler generates. If the caller isn't doing such things as varying its own stack frame, only saving >its< caller's registers when they might be used locally (rather than saving them once on entry and restoring them on exit), and shuttling register variables off to holding areas at unknowable locations in its frame, this will be sufficient. (And if the compiler is smart enough to do this sort of thing, it can also be smart enough to recognize that "setjmp" is being called, and provide as many hooks as necessary.) I would think that "b" could cause more problems for setjmp/longjmp than "c", since setjmp would have to find and save a variable number of stored registers, and longjmp restore them, without damaging other things in the caller's frame. What have I overlooked? >Well, I benchmarked this technique [b] against the alternative of having the >callee save, and it came out better on both machines. [] The main reasons >for the difference are interesting: > >(a) fewer registers are involved. This is because the callee must save > every register it uses ANYWHERE in its body, whereas the caller need > save only registers CURRENTLY LIVE. > >(b) fewer memory accesses. Callee must save and restore always; caller can > restore the register from a declared variable some (~1/3) of the time, > and so need not save it. I find this benchmark interesting. Does (b) actually make up for routines that use fewer register variables? Or does that savings get canceled by the reduced number of saves when the callee has more? =========================================================================== "I've got code in my node." | UUCP: ...!ihnp4!itivax!node!michael | AUDIO: (313) 973-8787 Michael McClary | SNAIL: 2091 Chalmers, Ann Arbor MI 48104 --------------------------------------------------------------------------- Above opinions are the official position of McClary Associates. Customers may have opinions of their own, which are given all the attention paid for. ===========================================================================