Xref: utzoo comp.lang.c:27080 comp.lang.misc:4544 Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!yale!cmcl2!lanl!lambda!jlg From: jlg@lambda.UUCP (Jim Giles) Newsgroups: comp.lang.c,comp.lang.misc Subject: Re: function calls Message-ID: <14281@lambda.UUCP> Date: 20 Mar 90 23:07:40 GMT References: <29551@amdcad.AMD.COM> Lines: 51 From article <29551@amdcad.AMD.COM>, by tim@nucleus.amd.com (Tim Olson): > [...] > It might be true that scientific routines written in FORTRAN may have > this many live, non-overlapping variables to keep in registers, but I > don't believe this is true in general. Statistics from a large > collection of programs and library routines (a mix of general and > scientific applications written in C) show that of 782 functions (620 > of which were non-leaf functions), an average of 6.5 registers per > function were live across function calls. This statistic can only be interpreted in one way: the C compiler in question didn't allocate registers very well. Especially in scientific packages, there are _HUGE_ numbers of 'live' _VALUES_ to deal with during execution of even simple routines. Vectors, arrays, lists, strings, etc, are alle being either produced or consumed. The fact that none of these _VALUES_ were in registers at the time of the call indicates one of two things: 1) the code in question was fragmented to the point that most procedures had only a few data items (and scalar at that), or 2) the compiler simply wasn't using the registers to anywhere near their potential. Since you imply the code was well written, I reject the first explanation. That leaves the compiler. My experience (I don't have statistics) with both Fortran and C is that good compilers generally PACK the registers with as much live data as possible. Even an apparently pure scalar loop that does only 'simple' operations may be 'unrolled' a few times to make better use of the registers. Compilers are becoming available that apply that optimization automatically (so this isn't just a case which applies only to 'coder enhanced' code). If such a loop (and this is still the simple kind mind you) had a procedure call imbedded in it, all the registers that the procedure might use would have to be spilled to memory - and then reloaded on return. On the Cray, spilling and reloading just _ONE_ vector register is over 150 clocks (64 elements at one clock each to and from memory plus time for the memory pipeline plus a little overhead to set up stride, address, etc.). This is _not_ a tiny problem. Again I say, this scheme of having 'preserved' vs. 'temp' registers for procedure calls only _appears_ to save time. In truth, it deprives you of registers which could be put to use (at least if your compiler was clever enough). The only solution to the problem is to 'inline' (or, at least, schedule registers globally). At present, in most environments, the only way to do this is by manually 'inlining' the procedures. Let's hope that more automatic solutions become generally available in the next few years. By the way, aside from a problem with aliasing with C, Fortran and C are identical with respect to optimization, register allocation, etc.. So, your implied put-down of Fortran is not relevant in this context. J. Giles