Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!rutgers!lll-lcc!well!msudoc!crlt!michael
From: michael@crlt.UUCP
Newsgroups: comp.arch
Subject: Re: subroutine frequency
Message-ID: <647@crlt.UUCP>
Date: Fri, 20-Feb-87 15:41:28 EST
Article-I.D.: crlt.647
Posted: Fri Feb 20 15:41:28 1987
Date-Received: Sun, 22-Feb-87 10:44:36 EST
References: <1881@homxc.UUCP> <898@moscom.UUCP>
Organization: McClary Associates, Ann Arbor MI
Lines: 101
Keywords: register stack frame variable
Summary: Linkage "b" slower than "c" if compiler not smart.

[Yum yum!]

In article <898@moscom.UUCP>, jgp@moscom.UUCP (Jim Prescott) writes:
> 
> >do compilers take the time to keep track of which registers have
> >been used and only save the 'dirty' ones or do most call and
> >return mechanisms save the entire register set on the stack?
>
> It depends on the architecture and the compiler, the three easy ways
> to do it are:
> 	a) save all registers
> 	b) have the caller save only the registers it is using
> 	c) have the callee save only the registers it will use
> pdp-11's use "a" since they only have 3 register variables anyway.  Most
> 68k compilers use "c" since you get about 12 register variables.  I don't
> know of anyone who uses "b" but it should be about as efficient as "c".

In code generated by a brute-force compiler, "b" wastes a lot of CPU,
by saving registers that will never be trashed.  Suppose you're using,
say, five of them, and make calls to subroutines that will only use one.
Method "c" does one-fifth as much register save/restoration as method "b".

The only place where method "c" saves an unused variable that method "b"
doesn't is near the top level of your program, where a given register might
not yet contain anything that must be preserved (unless the author of the
O.S. was a total idiot).  Here method "c" will waste time saving and
restoring junk.  But the top level code would normally be executed much
less than the lower level stuff.

On the other hand, method "b" could gain a global advantage when subroutines
with few register variables make many calls to subroutines with many, which
don't in turn make calls to another generation of children (or at least not
while their registers need preservation).  In cases like this, method "b"
has saved the register once, while "c" would save it many times.

Method "b" has other advantages as well:

 - It makes the caller, not the callee, responsible for the integrity of
   its own environment.  Thus, if a hand-coded routine makes an error in
   register preservation, it will break itself (and finding the error will
   be easy), not previously-debugged portions of the calling routine.

 - It offers the opportunity to save CPU by omitting the storage and
   restoration of register variables that no longer contain anything
   of value (and a smart enough compiler might be able to determine this).

In article <540@sei.cmu.edu.UUCP> firth@sei.cmu.edu.UUCP (Robert Firth) writes:
>In article <898@moscom.UUCP> jgp@moscom.UUCP (Jim Prescott) writes:
>>
>>The method used has a large effect on whether setjmp/longjmp can put the
>>correct values back into register variables (SYSVID says they may be
>>unpredictable :-(.
>
>The codegenerators I wrote for the PDP-11 and VAX-11 use method (b).  The
>main reason for this was precisely the longjump problem: if local frames
>store non-local state, then that state can be restored only by the very
>slow process of unwinding the stack. []

I seem to be missing something here.  Why can't setjmp save the entire
register set, plus as much of the stack as would be pushed by the
caller as it calls, in "env"?  All "longjmp" needs to do is restore
the state of the caller as of the "setjmp" call, and it is allowed,
and even encouraged, to know what kind of code its particular compiler
generates.  If the caller isn't doing such things as varying its own
stack frame, only saving >its< caller's registers when they might be
used locally (rather than saving them once on entry and restoring them
on exit), and shuttling register variables off to holding areas at
unknowable locations in its frame, this will be sufficient.  (And if
the compiler is smart enough to do this sort of thing, it can also be
smart enough to recognize that "setjmp" is being called, and provide
as many hooks as necessary.)

I would think that "b" could cause more problems for setjmp/longjmp
than "c", since setjmp would have to find and save a variable number
of stored registers, and longjmp restore them, without damaging other
things in the caller's frame.  What have I overlooked?

>Well, I benchmarked this technique [b] against the alternative of having the
>callee save, and it came out better on both machines.  []  The main reasons
>for the difference are interesting:
>
>(a) fewer registers are involved.  This is because the callee must save
>    every register it uses ANYWHERE in its body, whereas the caller need
>    save only registers CURRENTLY LIVE.
>
>(b) fewer memory accesses.  Callee must save and restore always; caller can
>    restore the register from a declared variable some (~1/3) of the time,
>    and so need not save it.

I find this benchmark interesting.  Does (b) actually make up for routines
that use fewer register variables?  Or does that savings get canceled by
the reduced number of saves when the callee has more?

===========================================================================
  "I've got code in my node."	| UUCP:  ...!ihnp4!itivax!node!michael
				| AUDIO: (313) 973-8787
	Michael McClary		| SNAIL: 2091 Chalmers, Ann Arbor MI 48104
---------------------------------------------------------------------------
Above opinions are the official position of McClary Associates.  Customers
may have opinions of their own, which are given all the attention paid for.
===========================================================================