Xref: utzoo comp.lang.c:27051 comp.lang.misc:4524 Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!samsung!zaphod.mps.ohio-state.edu!rpi!sci.ccny.cuny.edu!phri!cmcl2!lanl!lambda!jlg From: jlg@lambda.UUCP (Jim Giles) Newsgroups: comp.lang.c,comp.lang.misc Subject: Re: A note for those not consumed by efficiency worries (was: function calls) Message-ID: <14278@lambda.UUCP> Date: 19 Mar 90 23:02:46 GMT References: <1990Mar17.190858.13930@athena.mit.edu> Lines: 107 From article <1990Mar17.190858.13930@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit): > It is unfortunate that perpetual concerns about microoptimization > tend to suggest that function call overhead is unacceptable and > should be avoided. For most programs, readability and > maintainability are far more important than efficiency, and good > modularity (accompanied by lots of function calls) is of course > an important property of a well-written, readable and > maintainable program. I quite agree. No one in this thread of discussion has suggested otherwise, certainly not me. In fact, I have argued this very point on this newsgroup in the recent past (against someone who thought the programmer should do his own common expression eliminations, etc.). If your most critical code issue is readibility, maintainability, or just_get_it_out_the_door_before_ the_deadline, then concentration on optimization issues is not likely to pay off. HOWEVER, there _ARE_ some contexts where speed is everything. Procedure call overhead is one area where the compiler/environment cannot yet be expected to do the optimizations for you (in fact, the compiler _CAN'T_ do it if separate compilation is allowed). > [...] > (Remember, too, that function calls on any machine are by no > means "slow." This discussion is splitting hairs between > "extremely fast" and "ridiculously fast;" the concerns about > having to save state in a stack frame merely indicate that it is > nontrivial to make things ridiculously fast.) This is simply not true. Procedure calls are the slowest 'single operations' that high-level languages provide. On the Cray, for example, call/return overhead varies from a low of several dozen to a high of a few hundred clocks! In fact, for 'baselevel' or 'leaf' procedures, the overhead often exceeds the the time spent in the procedure by multiple factors. If you claim that procedure calls are fast, let's see your benchmark. Note: method for benchmarking call/return overhead. Take a production code. Fetch the real-time clock before and after a 'leaf' procedure call. 'Inline' the procedure and time the 'inlined' code. compare the times. No fair using 'dummy' drivers to make the call - you need an active 'caller' which has 'live' values to be saved/restored. > In article <14273@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: >> [... 24 registers is a small ceiling ... ] > > I'd dispute this, and I'm not even religious about keeping all of > my subroutines smaller than a page or anything. Few of my > routines even have 24 variables, let alone 24 variables that have > good reason to be put in registers. [...] The smallest routine I've ever written (this was for a class, years ago) was a matrix multiply on fixed-sized matrices (they were always guaranteed to be 3 by 3). Say the array arguments are A, B, and C - I want to compute the value of A=B*C. Each of the 18 elements of B and C are used 3 times during the calculation - each of the 9 elements in A are summed to 3 times. For efficiency, you want to load each element of the arguments only once. Similarly, you want to store each element of A only once. The rest of the time, you want the data in registers. If speed were paramount, I would unroll the loops of the matrix multiply and do the whole thing explicitly. I might be able to find an order in which to do the calculation which only needed 24 registers, but the the obvious method would be to put each of the array elements in a separate register - that's 27 'live' values! Note that the above was the _smallest_ procedure I've written! Normally I would not even consider doing matrix multiply as a procedure call (the idiom of three nested loops, etc., is simple to read and widely recognized and understood). In a language like C, this problem would be a good candidate for a MACRO. To be sure, there are certainly procedures which don't need as many as 24 registers. But, if speed were paramouint, I suspect that nearly all of these would be immediate candidates for 'inlining'. The only exceptions to this would be assembly routines which used hardware/instructions which were not accessible from the high-level language. > [...] Good code, > particularly object-oriented code, has lots and lots of small > routines, but they don't all need inlining. I disagree. Good code is neither monolithic nor fragmented. There is a compromise between these two tendencies which is better than either. A lot of the OOP that I've seen leans way too far toward fragmentation. I've seen a matrix multiply method implemented as a sequence of calls to a vector dot product method, which in turn does a vector elemental multiply method and a vector summation method. The problem is that matrix multiply is a _single_ operation - it's actually _easier_ to read and maintain if it's all done in one place. > [...] (A medium-sized -- > i.e. small but not tiny -- routine that's called a lot shouldn't > necessarily be inlined. Code size still matters.) I agree, with a proviso: code should _always_ be 'inlined' if the user explicitly asks for it. Otherwise, the rules for automatic 'inlining' should be: 1) 'Inline' if the procedure/method is only called once in the entire call tree. 2) 'Inline' if the procedure/method body is smaller than some multiple of the call/return overhead code (including register save/restores). Otherwise, don't 'inline'. Of course, this requires a language environment which allows 'inlining' at all. I am hopeful that this ability may soon (within this decade) be commonplace. But, who knows? J. Giles