Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!sdd.hp.com!wuarchive!uunet!world!iecc!compilers-sender From: mcg@ichips.intel.com (Steven McGeady) Newsgroups: comp.compilers Subject: Re: the Evil Effects of Inlining Keywords: design, optimize Message-ID: <9105070055.AA08265@ishark> Date: 7 May 91 00:55:16 GMT References: <1991May1.035622.25021@daffy.cs.wisc.edu> <1991May2.180508.17100@rice.edu> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: mcg@ichips.intel.com (Steven McGeady) Organization: Compilers Central Lines: 66 Approved: compilers@iecc.cambridge.ma.us I've just read the thread on inlining (through 5 May 91), and have a few comments to add, as an implementor: - respondants don't seem to be making a distinction between inlining as a programmatic, user-specified extension, and inlining as a transparent, compiler-implemented optimization. While closely related, I feel these two types of inlining must be addressed separately: - user-specified inlining is as good as the user's understanding of his or her program. In situations where the user has a deep understanding of the performance behaviour of the program under study, user-directed inlining can be a powerful tool. When I wrote 'inline', a stand-alone C-to-C inliner, I carefully studied several algorithms, including 'compress'. Careful profiling followed by inlining resulted in a 10% performance improvement, even in this carefully-optimized program. - heuristic inlining is only as good as the heuristic (duh). Our research is pointing out that we haven't found a good heuristic yet without using profiling feedback. We've tried to synthesize a heuristic from call-graph, register-pressure, and size information, without repeatable success (i.e. over a broad selection of programs). Heuristics that include profiling input (weighted dynamic call tree) can repeatably produce improvements in most programs, without causing serious regressions. (Our compiler does global (inter-module) inlining with a two-pass model). Unfortunately, users often think they know more about their programs than they actually do, and many don't have the tools, or are too lazy to measure their programs. Many inlining decisions users make are just plain wrong. Heuristic inliners like gcc's make the user's task easier: try it both ways, and pick the fastest. This doesn't validate the practice of inlining, it merely provides commentary on the effectiveness of gcc's heuristic (which is: not particularly). - several respondants have noted that good interprocedural dataflow analysis can yield better results. In theory, I agree (on processors where calls are relatively cheap), however, true REF/DEF dataflow information can quickly become intractable (or at least very difficult) in a large program, when attempted across the entire program (for C, when tracking all points-to information). So if Global DFA is limited to a procedure, inlining frequently-traversed arcs on the call-graph can dramatically improve the overall effectiveness of DFA-based optimizations. - along the same lines as the last point, inlining can also expose many other worthwhile optimizations that can't profitably be done on an intermodule basis. In particular, until call tailoring becomes a reality (including debug support!) I think the utility of some classes of inlining to be high, when modified with profile information. Summary: - 'inlining' means two different things - user-inlining is effective only for sophisticated users - compiler heuristic inlining is currently hampered by poor heuristics - profile information considered essential for inlining heuristics - intermodule global DFA considered difficult to intractable - intelligent profile-driven inlining is a Good Thing S. McGeady i960 Software Architecture Group Intel Corp. -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.