Xref: utzoo comp.lang.c:26952 comp.lang.misc:4474 Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!yale!cmcl2!lanl!lambda!jlg From: jlg@lambda.UUCP (Jim Giles) Newsgroups: comp.lang.c,comp.lang.misc Subject: Re: function calls Message-ID: <14271@lambda.UUCP> Date: 15 Mar 90 23:45:56 GMT References: <23113@mimsy.umd.edu> Lines: 51 From article <23113@mimsy.umd.edu>, by chris@mimsy.umd.edu (Chris Torek): > [... 'caller-save' vs. 'callee-save' registers ...] > As shown above, this is no longer true. If the leaf uses a `large' > number of registers (more than are available as temporary computation > registers in non-leaf routines), this statement holds; if not, the > fact that the routine is a leaf makes the registers `free'. > > (Of course, callers that use lots of registers, and store things in > the temporary registers, must spill those registers on subroutine > calls. This may be what Jim Giles meant all along. Perhaps someone > at MIPS can post statistics as to how often this is the case.) This is exactly what I meant. The Cray system has a similar mechanism (and, in fact they even have special types of 'leaf' procedures called 'baselevel' routines). The problem is that the 'caller' routine still needs to save 'live' values around calls because the registers assigned to the 'callee' are nearly always in use. When I write in assembly, I tend to use all the registers I can in order to avoid the memory overhead - memory costs about a dozen clocks per reference while transfers to the temp regs only costs one. Even with memory pipelined and running in parallel with other functional units, this extra delay is expensive. If I were writing a compiler, I would be similarly greedy with the registers for generated code. All this trouble could be avoided if the register use of the 'callee' were known in advance. Then the code generator for the 'caller' could do register scheduling with this extra information in mind. Still causes problems if the 'callee' uses a _lot_ of registers, but it's better than nothing. Of course, the best deal (if speed were para- mount) would be to 'inline' the 'callee' completely. Then the register scheduling would take place across the call boundary (and the save/ restore could be hidden better under pipelining). > [...] > The only problem with this last statement (`interprocedural analysis > cannot be done due to separate compilation') is that someone already > does it---namely, MIPS do it at the highest compilation level. Again, > one does it by cheating: `compile' to `Ucode', an intermediate level > code that is neither source nor object. When `linking' the Ucode, > put everything together, do global flow analysis, optimise, and then > generate machine code. I've often thought that code generation should be done by the loader for this very reason. Both inlining and regester scheduling across calls would be improvements that would be worth the loader slowdown. In addition, the compile step would be considerably faster. This means that syntax checking would be a breeze (a common use of the compiler, like it or not, is as a form of 'lint' - at least for non-C languages). J. Giles