Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!rutgers!ub!oswego!news From: dl@g.g.oswego.edu (Doug Lea) Newsgroups: comp.lang.c++ Subject: Optimized virtual function invocation Message-ID: Date: 25 Aug 90 16:29:53 GMT Sender: news@oswego.Oswego.EDU (Network News) Reply-To: dl@g.oswego.edu Distribution: comp Organization: SUNY Oswego Lines: 130 As Paul Vaughan mentioned (thanks!), I've looked into ways of permitting and performing static resolution, and described my best thoughts for doing so in the '90 USENIX C++ proceedings. I'll outline the highlights here. First, it's important to realize that the goal of static resolution is NOT merely to replace indirect function calls with direct function calls. Happily, virtual function calls usually result in very small, often unnoticeable, performance penalties over direct calls. The goal is instead to enable PROCEDURAL INTEGRATION, achieved by the inlining of a typically small number of crucial member functions, which in turn enables other standard powerful optimization techniques to take place. For example, the sum(Matrix&) routine listed in my paper, in which all access functions (rows(), operator()(i, j), etc.) are virtual, executes less than 2 times faster when virtual calls are replaced by direct calls, but runs 25 times faster when hand-optimized in a way that static resolution could automate. (These are on a Sun4/110 using g++.) Factors of 25 are hard to ignore. People (myself included) sometimes use highly compromised, obscure, and even unsafe design and coding schemes to avoid such things. My goals in exploring static resolution lied not so much in getting raw speed per se (since this can almost always be done now via ugly hacks) but instead in encouraging better design in the first place, by eliminating much of the motivation for such compromise. (See my discussions about `Abstract Base Classes' (ABCs), etc., in the paper.) After thinking to themselves, ``why can't my compiler just figure out how to optimize this code all by itself'', most people's first good idea about how to enable static resolution is to be able to somehow indicate whether a class is a `leaf' class. Since methods of such a class could not be overridden, hardwiring would be possible. I found this unnacceptable for several reasons: 1) As Steve Clamage wrote, it disables an important software reuse idiom in a thoroughly unacceptable manner, since leaf classes could never be subclassed. Thus, the strategy hurts the common practice of inheriting mechanism across implementation classes. 2) Unless ALL leaf classes are required to be designated as such (and perhaps even still, depending on various other details), programmers would still have to write out each and every special leaf class version of polymorphic functions themselves, even though, as is almost always the case, the special versions would contain EXACTLY the same C++ statements, just optimized in different ways for different arguments. This gets out of hand very quickly for functions with several arguments. Moreover, if a programmer decides to modify a hierarchy, changing a leaf into a non-leaf class, all of this work would have to be manually redone. 3) Because of (2), as Steve and others have also pointed out, this leads to no pragmatic improvement over existing constructs, since you can now manually invoke subclass routines inside functions anyway, albeit in a sometimes-ugly and always-dangerous manner. (Dangerous, since such routines can fail when new subclasses are introduced. Despite other failings, specially denoting leaf classes would prevent this.) The alternative strategy I laid out is to make support for polymorphism-by-synthesis (as opposed to polymorphism-by-dispatch) a first-class mechanism in C++. The technique is called `customization', and is based on work originally described by Ungar, Chambers, and others developing the SELF compiler. The strategy amounts to alerting the compiler (via a barely-defensible bastardization/alternative usage of the `template' keyword) that all specializations of a method or function should be resynthesized (customized) by the compiler rather than relying on virtual function calls that otherwise allow the same exectuable code to work with each different subclass argument it might encounter. This can be implemented via an internal macro-expander-like system driven by a type inferencer. For example, if there were a base class `Matrix', a function `float sum(template Matrix& m)', a Matrix subclass `RowMajorMatrix', and finally client code RowMajorMatrix a; float s = sum(a) then a specialized version of `float sum(RowMajorMatrix& m)', specifically optimized for RowMajorMatrix (and not any subclass thereof!) would be automatically be synthesized to handle the `float s = sum(a)' call. Of course, there are requirements for backup strategies, involving continued use of dispatch-based versions for cases where the exact type of an argument is unknown at compile time, along with various other niceties described in the paper (as well as a few that weren't -- write for details if you are interested.) The most controversial aspects of the proposal are 1) It has surprisingly far-reaching effects on the C++ type system itself, including a few outright conflicts where static resolution and dynamic resolution can currently have differing effects. (Of course, I argue that all such changes are for the best! Just for one example, it provides a way for programmers to avoid `slicing' via customized call by value, which addresses a topic recently discussed here.) 2) Even though several of the most important cases where customization can be applied are relatively straightforward to implement, getting the whole thing right doesn't look to be doable in an afternoon, a week, a month, or perhaps even a year. 3) It has consequences for C++ environments. Means for programmers to have some say about tradeoffs of potential code bloat for efficiency, prospects for integrating customization with parameterized types, and impact on the separate compilation model all need to be addressed. Many people apparently believe that there might be a means to get similar effects with less impact on the language and current compilers. I remain unconvinced that this is possible. The fact that both the SELF and TS (typed smalltalk) designers and implementers use variations on customization extensively to obtain their impressive performance improvements over standard OO compilation schemes lends credence to the notion that customization can play an important role in maintaining C++'s usual edge in efficiency compared to other OO languages. However, one viable alternative is to incorporate customization into a metalanguage with associated tools comprising a C++ programmers workbench, rather than into the language proper. I've been exploring this too. -Doug -- Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2688 email: dl@g.oswego.edu or dl@cat.syr.edu UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl