Path: utzoo!attcan!ncrcan!scocan!ron From: ron@sco.COM (Ron Irvine) Newsgroups: comp.arch Subject: Re: Inlining subroutines at link time Message-ID: <1990Jul6.115849.4556@sco.COM> Date: 6 Jul 90 15:58:49 GMT Reply-To: ron@scocan.UUCP (Ron Irvine) Organization: SCO Canada, Inc. (formerly HCR Corporation) Lines: 71 The CDC post-loader (called the afterburner) was developed to optimize the executable. It can inline functions, reallocate registers across function calls and turn far calls into near calls. It can be a big win on the CYBER since subroutine calls are expensive. In general a post-loader can be a powerful tool in "optimizing" the executable, it can: Inline/Outline: - inline functions, faster execution larger code size. - outline code (make into a function), this can reduce code size and be profitable on some machines (smaller code may fit into cache). Do global register allocation: - this can be of great benifit for processors with global registers. Assign registers across function calls: - a post loader has full program scope (including libraries) so it can create register usage information and eliminate some register save/restores across function calls. Use short addresses: - data can be grouped to allow short memory references. This could be 16 bit offsets instead of 32 bit. - or fast access through a data register + small offset. - far calls can become near calls Change instructions: - Instruction sequences can be tuned for a specific machine (without a recompile). - Patch around hardware/software bugs without changing the compiler. - Emulate new hardware/software. - Yet another place to reposition load/store and delay branch code. Relocate data and functions: - This can be done to improve cache hits based on static or dynamic infomation (with feeback from a program run). - Data can be grouped so that often used data resides in a common page in memory. - functions can be gouped so that paging can be minimized and startup time reduced. - get rid of function and data that is newer used. Performance monitor: - Add code to record the performance of a program. This can be done to the "final" executable without the need to recompile and link with special libraries. Misc: - add code to memory references to find a specific bug - modify the scope of symbols in a object file (for example this can allow more than one yacc/lex parser in a program) - allow a function (or data) to be replaced with a new one. - rearrange a production binary so that the function positions (addresses) are unique for each copy (and thus can encode a serial number for identification). See, "Postloading for Fun and Profit", S.C.Johnson, UNENIX, Winter 90. Ron Irvine, uunet!scocan!ron