Xref: utzoo comp.lang.c:26926 comp.lang.misc:4460 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!aplcen!haven!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.lang.c,comp.lang.misc Subject: Re: function calls Message-ID: <23113@mimsy.umd.edu> Date: 15 Mar 90 08:23:49 GMT References: <1990Mar14.172152.25436@utzoo.uucp> <14268@lambda.UUCP> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 81 In article <14268@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: >Most machines implement call/return with single instructions. However, this >tends to be the tip of the iceberg for procedure call overhead. The interface >_MUST_ do the following: > >1) Save/restore all register values that are still 'live' values for the > caller. Right. >2) Test the stack to make sure there is enough room for the callee's local > variables (and call the memory manager for a new stack frame if there > wasn't room). True, but (as Jim Giles points out) VM machines can cheat. (He does not say how to cheat, but it is very easy: do not test. Simply try to use the new stack address(es). If they are invalid, the kernel takes an access fault and decides whether to grow the stack or to terminate the program.) >3) Create (on the stack) a traceback entry so the debugger and/or the > postmortem analyzer can find the current thread through the call tree. There is a way to cheat on this as well; MIPS use it in their compilers. When a function does not need a frame pointer---most do not---it gets instead a `virtual frame pointer' which is simply an offset from the stack pointer. The debugger reads the symbol table of the object file to find out which register is the stack pointer, and what the offset from this register is, so as to find the previous call (and any saved registers, using a compiler-generated field also found in the object file). >The problem is: _MOST_ of the procedure call overhead is concentrated in >number (1)! And, this overhead applies to all procedures - including >'leaf' routines. True; and this is where most of the effort at reducing function call overhead must be applied. Fortunately, some smart guys have figured out some good methods to handle this. On the MIPS, for instance, the system designates some registers as `caller saves' and some registers as `callee saves'. A leaf procedure heads straight for the `caller saves' registers; others head for the `callee saves' registers. In other words, leaf routines can (provided they do not need a huge number of registers) get away with saving and restoring nothing. (They need not even save and restore the stack pointer, if a `callee saves' register is free: simply do something like add #-40,standard_sp,temp_reg_3 // set temp_reg to sp-40 and then use temp_reg_3 as your `stack pointer'. >Basically, 'leafness' has very little to do with procedure call overhead. As shown above, this is no longer true. If the leaf uses a `large' number of registers (more than are available as temporary computation registers in non-leaf routines), this statement holds; if not, the fact that the routine is a leaf makes the registers `free'. (Of course, callers that use lots of registers, and store things in the temporary registers, must spill those registers on subroutine calls. This may be what Jim Giles meant all along. Perhaps someone at MIPS can post statistics as to how often this is the case.) >Because modern machines tend to have a _large_ number of registers, >implementing (1) is usually quite expensive - and it can't be made >cheaper without interprocedural analysis which in turn can't be done as >long as separate compilation is possible. The only problem with this last statement (`interprocedural analysis cannot be done due to separate compilation') is that someone already does it---namely, MIPS do it at the highest compilation level. Again, one does it by cheating: `compile' to `Ucode', an intermediate level code that is neither source nor object. When `linking' the Ucode, put everything together, do global flow analysis, optimise, and then generate machine code. (Of course, linking starts to get very slow, encouraging people to run with lower optimisation levels. This is more or less as it should be, though: you choose the trade-off of compilation time for run time.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris