Xref: utzoo comp.lang.c:26971 comp.lang.misc:4482 Path: utzoo!attcan!uunet!wuarchive!usc!zaphod.mps.ohio-state.edu!think!paperboy!meissner From: meissner@osf.org (Michael Meissner) Newsgroups: comp.lang.c,comp.lang.misc Subject: Re: function calls Message-ID: Date: 16 Mar 90 15:32:26 GMT References: <1990Mar15.173408.29622@utzoo.uucp> <14272@lambda.UUCP> Sender: news@OSF.ORG Organization: Open Software Foundation Lines: 105 In-reply-to: jlg@lambda.UUCP's message of 16 Mar 90 00:30:59 GMT In article <14272@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: | Careful register-allocation conventions are usually the ones that use the | registers most greedily. This is because, in general, the more you use | the registers, the faster your code goes. If your "careful" approach to | registers is not greedy, how much performance am I loosing to it by not | getting full use out of the hardware availible? Further: what's "an | adequate supply of registers"? I know code which can use up as many as | you give me. In fact, if interprocedural analysis were commonplace | instead of rare, you would probably find that the register set was almost | always completely full (this wouldn't matter since it would be a logical | consequence of register allocation being done on the program instead of | a routine at a time). | | These problems may all be solved in the future - even the very near future. | But at present, only MIPS and Cray (the only ones mentioned anyway) have | addressed this problem. And these two 'solutions' rely on the 'callee' | not using lots of registers and the 'caller' deliberately leaving some | spare ones - but this, in itself, may have a negative impact on performance. Umm, the 88k also partitions the register set into caller save and callee save. For the two machines that I'm familar with, the breakdown is as follows: MIPS: 32 integer 32-bit registers 1 register hardwired to 0 2 registers used for return value and/or staic link 4 registers used for arguments 10 registers not preserved across calls 9 registers preserved across calls 1 register for stack pointer 1 register for return address 1 register for accessing small static/global data 1 register used by the assembler 2 registers used by the kernel 16 floating point 64-bit registers 2 registers for return value 2 registers for arguments 6 registers not preserved across calls 6 registers preserved across calls 88K: 32 32-bit registers (double precision takes 2 regs) 1 register hardwired to 0 8 registers for arguments & return value 4 registers not preserved across calls 13 registers preserved across calls 4 registers reserved for linker/OS 1 register for the stack pointer 1 register for the return address Note that registers used for passing arguments, and returning values are also used as temps. If the return address is stored on the stack, the register that holds can also be used for a temporary. Neither architecture requires the use of a frame pointer, though frame pointers can be synthesized easily if needed because variable sized stack allocations are done. Finally, both machines software defines static tables that describe where registers are stored on the stack, and what register and offset from that register are to be used as a virtual frame pointer for use in the library and in the debugger. The MIPS compilers also have a -O3 option which does global register allocation. Here is an fragment of the man page from a Decstation: -O3 Perform all optimizations, including global register allocation. This option must precede all source file arguments. With this option, a ucode object file is created for each C source file and left in a .u file. The newly created ucode object files, the ucode object files specified on the command line, the runtime startup routine, and all the runtime libraries are ucode linked. Optimization is performed on the resulting ucode linked file and then it is linked as normal producing an a.out file. A resulting .o file is not left from the ucode linked result. In fact -c cannot be specified with -O3. -feedback file Use with the -cord option to specify the feedback file. This file is produced by with its -feedback option from an execution of the program produced by -cord Run the procedure-rearranger on the resulting file after linking. The rearrangement is performed to reduce the cache conflicts of the program's text. The output is left in the file specified by the -o output option or a.out by default. At least one -feedback file must be specified. Because of the restriction of not specifying -c, I'm not sure how many people use this in practice for large software. I would imagine that for programs which use use runtime binding (ie, emacs, or C++ code with virtual functions), it would default back to the standard calling sequence. I wonder how much it buys for typical software as opposed to special cases. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA Catproof is an oxymoron, Childproof is nearly so