Path: utzoo!utgpu!attcan!uunet!husc6!linus!alliant!cantrell
From: cantrell@Alliant.COM (Paul Cantrell)
Newsgroups: comp.arch
Subject: Re: register save/restore
Message-ID: <2601@alliant.Alliant.COM>
Date: 4 Nov 88 11:58:38 GMT
References: <3300037@m.cs.uiuc.edu> <5938@killer.DALLAS.TX.US> <7580@aw.sei.cmu.edu>
Reply-To: cantrell@alliant.Alliant.COM (Paul Cantrell)
Organization: Alliant Computer Systems, Littleton, MA
Lines: 134

I'd like to make some minor comments on a really good article by Robert Firth
on register save procedures across procedure calls.

Having programmed several 680x0 systems with registers-saved-by callee, and
now working on a 680x0 architecture which has the caller save it's own
registers, I've had the chance to program the same instruction set with
both conventions used.

In article <7580@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes:
>First, and most important, if you are designing a professional-quality
>production compiler, this is the wrong question.  Such a compiler must
>perform interprocedural optimisation if it is to be respectably state
>of the art.  
>
>However, if you want to design a prototype, amateur, or deliberately
>low-cost compiler, the issue is probably one worth considering.  To
>keep this note short, I'm going to assume you understand the basic
>issue and are familiar with current hardware and software technology.

Well, you may have slightly overstated this - I'd guess that 98% of the
production quality compilers available today do not do interprocedural
optimizations. However, I agree that this is desirable.

>C. Which is more efficient?
>
>Happily, however, the efficiency arguments, in my experience, support
>the "caller saves" strategy, so one can indeed do well by doing good.
>
	[he goes on to describe the longjump case as being more efficient
	 when caller saves]

I would tend to ignore longjump since this is an infrequently used mechanism
compared to procedure calls in general. I think the efficiency of the basic
call/return is what needs to be looked at here. He strongly argues that I
shouldn't feel that way, but I'll leave it at that.

>In my experience, there is almost no difference between the number
>of registers used by the caller and the number used by the callee.
>
>Small procedures tend to use fewer than less small, and leaf procedures
>tend to be a bit smaller, so on balance it seems marginally better for
>the callee to save. (What this also tells us is that interprocedural
>optimisation of leaves and leaf-callers only will give you big returns)

Yes, this is one problem I have with caller saving - it substantially
increases the cost of calling small procedures that need very few registers.
The register save restore done by the caller can easilly outweigh the entire
cost of the procedure itself, if it is something simple like a queue
manipulation or an assembly language routine which gives you access to a
special instruction. I don't think the word 'marginal' applies here - from
doing code inspection I think this can account for a lot of wasted time. As
you point out, it simply argues strongly for interprocedural analysis. (An
obvious thing to do for such simple leaf procedures is to inline them, and
get rid of the procedure call overhead entirely).

A nasty side effect of our compiler (you could argue that this is simply
a bug in the register allocation, but I think it's a little more complicated
than that) is that for small C routines, adding 'register' statements may
actually slow the code down by causing many save/restores to be generated.
This obviously is impacted by where the variable is used, how often, and
where the procedure calls are in relation to usage of the register variable.
My only point is that the programmer expectes that adding 'register' to
those variables which are used frequently should make his code run faster,
not slower. In the callee saves convention, it is usually trivial for the
program to determine whether 'register' is called for - it is almost certainly
based on how many times he uses the variable within the procedure. But
for caller saves, it is almost impossible for him to tell.

>But this is outweighed by two factors
>
>* The callee must save all registers it will use throughout the body;
>  the caller need save only the registers that are live at the point of
>  call.
>
>* When two or more calls occur in succession, both callees must save,
>  but the caller need save only once.

From code inspection of typical C code, the first point doesn't seem to be
much of a win or loss, it's true that only the live registers need be saved
if caller is saving, but in 'good' C code there are typically always enough
registers in use (if the compiler has done a decent job of register allocation)
such that you always end up saving a large number of registers.

The second point that you can avoid multiple save/restores when you have
several procedure calls in a row is certainly true, but again, the code
inspection I have done shows that a fair amount of the time you end up
doing all the save/restoring on each one because of conditional branching
making the path through the calls unpredictable at compile time. However,
this sometimes can be a large win - I suspect that this is the single
largest reason that you can expect a performance gain with caller saving.

Anyway, here is a list of what I consider the pros and cons of caller saving
his own registers:

Pros:
	1) Avoids multiple save/restore operations across consecutive
	   procedure calls.

	2) Saved register state is local to owner, not buried on the stack
	   by the various called procedures.

	3) Only 'live' registers need be saved

	4) If a copy of the data exists and is easy to obtain, no save need
	   be done.

Cons:

	1) Often causes more saves than required when calling leaf procedures
	   since they are small, but this is the most common operation so the
	   penalty becomes large.

	2) Makes programs slightly larger. Instead of one copy of the register
	   save/restore, there has to be a copy at every invokation. This may
	   have performance impact because of cache size, main memory size.
	   However, Pro#1 may decrease the impact of this some.

	3) For assembly language programming, code may be slightly harder to
	   write and understand since determining which registers must be
	   saved/restored depends on how the thread of control can be affected
	   by conditional branching, etc. Typically, with the callee saves
	   convention, the registers would be saved/restored at entry/exit
	   time (I'm gonna get flamed on that one).

Conclusion:

Neither convention seems to be all that much better. I'd say that caller saving
has a slight edge performance wise, callee saving has a slight edge in terms
of readability/maintainability (only if you are using assembly language).

I think interprocedural analysis would be enough of a win over either of these
two methods that it strongly argues for people to move in that direction.

					PC