Path: utzoo!utgpu!attcan!uunet!husc6!linus!alliant!cantrell From: cantrell@Alliant.COM (Paul Cantrell) Newsgroups: comp.arch Subject: Re: register save/restore Message-ID: <2601@alliant.Alliant.COM> Date: 4 Nov 88 11:58:38 GMT References: <3300037@m.cs.uiuc.edu> <5938@killer.DALLAS.TX.US> <7580@aw.sei.cmu.edu> Reply-To: cantrell@alliant.Alliant.COM (Paul Cantrell) Organization: Alliant Computer Systems, Littleton, MA Lines: 134 I'd like to make some minor comments on a really good article by Robert Firth on register save procedures across procedure calls. Having programmed several 680x0 systems with registers-saved-by callee, and now working on a 680x0 architecture which has the caller save it's own registers, I've had the chance to program the same instruction set with both conventions used. In article <7580@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: >First, and most important, if you are designing a professional-quality >production compiler, this is the wrong question. Such a compiler must >perform interprocedural optimisation if it is to be respectably state >of the art. > >However, if you want to design a prototype, amateur, or deliberately >low-cost compiler, the issue is probably one worth considering. To >keep this note short, I'm going to assume you understand the basic >issue and are familiar with current hardware and software technology. Well, you may have slightly overstated this - I'd guess that 98% of the production quality compilers available today do not do interprocedural optimizations. However, I agree that this is desirable. >C. Which is more efficient? > >Happily, however, the efficiency arguments, in my experience, support >the "caller saves" strategy, so one can indeed do well by doing good. > [he goes on to describe the longjump case as being more efficient when caller saves] I would tend to ignore longjump since this is an infrequently used mechanism compared to procedure calls in general. I think the efficiency of the basic call/return is what needs to be looked at here. He strongly argues that I shouldn't feel that way, but I'll leave it at that. >In my experience, there is almost no difference between the number >of registers used by the caller and the number used by the callee. > >Small procedures tend to use fewer than less small, and leaf procedures >tend to be a bit smaller, so on balance it seems marginally better for >the callee to save. (What this also tells us is that interprocedural >optimisation of leaves and leaf-callers only will give you big returns) Yes, this is one problem I have with caller saving - it substantially increases the cost of calling small procedures that need very few registers. The register save restore done by the caller can easilly outweigh the entire cost of the procedure itself, if it is something simple like a queue manipulation or an assembly language routine which gives you access to a special instruction. I don't think the word 'marginal' applies here - from doing code inspection I think this can account for a lot of wasted time. As you point out, it simply argues strongly for interprocedural analysis. (An obvious thing to do for such simple leaf procedures is to inline them, and get rid of the procedure call overhead entirely). A nasty side effect of our compiler (you could argue that this is simply a bug in the register allocation, but I think it's a little more complicated than that) is that for small C routines, adding 'register' statements may actually slow the code down by causing many save/restores to be generated. This obviously is impacted by where the variable is used, how often, and where the procedure calls are in relation to usage of the register variable. My only point is that the programmer expectes that adding 'register' to those variables which are used frequently should make his code run faster, not slower. In the callee saves convention, it is usually trivial for the program to determine whether 'register' is called for - it is almost certainly based on how many times he uses the variable within the procedure. But for caller saves, it is almost impossible for him to tell. >But this is outweighed by two factors > >* The callee must save all registers it will use throughout the body; > the caller need save only the registers that are live at the point of > call. > >* When two or more calls occur in succession, both callees must save, > but the caller need save only once. From code inspection of typical C code, the first point doesn't seem to be much of a win or loss, it's true that only the live registers need be saved if caller is saving, but in 'good' C code there are typically always enough registers in use (if the compiler has done a decent job of register allocation) such that you always end up saving a large number of registers. The second point that you can avoid multiple save/restores when you have several procedure calls in a row is certainly true, but again, the code inspection I have done shows that a fair amount of the time you end up doing all the save/restoring on each one because of conditional branching making the path through the calls unpredictable at compile time. However, this sometimes can be a large win - I suspect that this is the single largest reason that you can expect a performance gain with caller saving. Anyway, here is a list of what I consider the pros and cons of caller saving his own registers: Pros: 1) Avoids multiple save/restore operations across consecutive procedure calls. 2) Saved register state is local to owner, not buried on the stack by the various called procedures. 3) Only 'live' registers need be saved 4) If a copy of the data exists and is easy to obtain, no save need be done. Cons: 1) Often causes more saves than required when calling leaf procedures since they are small, but this is the most common operation so the penalty becomes large. 2) Makes programs slightly larger. Instead of one copy of the register save/restore, there has to be a copy at every invokation. This may have performance impact because of cache size, main memory size. However, Pro#1 may decrease the impact of this some. 3) For assembly language programming, code may be slightly harder to write and understand since determining which registers must be saved/restored depends on how the thread of control can be affected by conditional branching, etc. Typically, with the callee saves convention, the registers would be saved/restored at entry/exit time (I'm gonna get flamed on that one). Conclusion: Neither convention seems to be all that much better. I'd say that caller saving has a slight edge performance wise, callee saving has a slight edge in terms of readability/maintainability (only if you are using assembly language). I think interprocedural analysis would be enough of a win over either of these two methods that it strongly argues for people to move in that direction. PC