Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!rutgers!ub!oswego!news
From: dl@g.g.oswego.edu (Doug Lea)
Newsgroups: comp.lang.c++
Subject: Optimized virtual function invocation
Message-ID: <DL.90Aug25123348@g.g.oswego.edu>
Date: 25 Aug 90 16:29:53 GMT
Sender: news@oswego.Oswego.EDU (Network News)
Reply-To: dl@g.oswego.edu
Distribution: comp
Organization: SUNY Oswego
Lines: 130


As Paul Vaughan mentioned (thanks!), I've looked into ways of
permitting and performing static resolution, and described my best
thoughts for doing so in the '90 USENIX C++ proceedings. I'll outline
the highlights here.

First, it's important to realize that the goal of static resolution is
NOT merely to replace indirect function calls with direct function
calls. Happily, virtual function calls usually result in very small,
often unnoticeable, performance penalties over direct calls. The goal
is instead to enable PROCEDURAL INTEGRATION, achieved by the inlining
of a typically small number of crucial member functions, which in turn
enables other standard powerful optimization techniques to take place.
For example, the sum(Matrix&) routine listed in my paper, in which all
access functions (rows(), operator()(i, j), etc.)  are virtual,
executes less than 2 times faster when virtual calls are replaced by
direct calls, but runs 25 times faster when hand-optimized in a way
that static resolution could automate. (These are on a Sun4/110 using
g++.)

Factors of 25 are hard to ignore. People (myself included) sometimes
use highly compromised, obscure, and even unsafe design and coding
schemes to avoid such things. My goals in exploring static resolution
lied not so much in getting raw speed per se (since this can almost
always be done now via ugly hacks) but instead in encouraging better
design in the first place, by eliminating much of the motivation for
such compromise. (See my discussions about `Abstract Base Classes'
(ABCs), etc., in the paper.)

After thinking to themselves, ``why can't my compiler just figure
out how to optimize this code all by itself'', most people's first
good idea about how to enable static resolution is to be able to
somehow indicate whether a class is a `leaf' class. Since methods of
such a class could not be overridden, hardwiring would be possible. I
found this unnacceptable for several reasons:

    1) As Steve Clamage wrote, it disables an important software reuse
        idiom in a thoroughly unacceptable manner, since leaf classes
        could never be subclassed.  Thus, the strategy hurts the
        common practice of inheriting mechanism across implementation
        classes.

    2) Unless ALL leaf classes are required to be designated as such
        (and perhaps even still, depending on various other details),
        programmers would still have to write out each and every
        special leaf class version of polymorphic functions
        themselves, even though, as is almost always the case, the
        special versions would contain EXACTLY the same C++
        statements, just optimized in different ways for different
        arguments. This gets out of hand very quickly for functions
        with several arguments. Moreover, if a programmer decides to
        modify a hierarchy, changing a leaf into a non-leaf class, all
        of this work would have to be manually redone.

   3) Because of (2), as Steve and others have also pointed out, this
        leads to no pragmatic improvement over existing constructs,
        since you can now manually invoke subclass routines inside
        functions anyway, albeit in a sometimes-ugly and
        always-dangerous manner. (Dangerous, since such routines can
        fail when new subclasses are introduced. Despite other failings,
        specially denoting leaf classes would prevent this.)

The alternative strategy I laid out is to make support for
polymorphism-by-synthesis (as opposed to polymorphism-by-dispatch) a
first-class mechanism in C++. The technique is called `customization',
and is based on work originally described by Ungar, Chambers, and
others developing the SELF compiler. The strategy amounts to alerting
the compiler (via a barely-defensible bastardization/alternative usage
of the `template' keyword) that all specializations of a method or
function should be resynthesized (customized) by the compiler rather
than relying on virtual function calls that otherwise allow the same
exectuable code to work with each different subclass argument it might
encounter.  This can be implemented via an internal
macro-expander-like system driven by a type inferencer.

For example, if there were a base class `Matrix', a function `float
sum(template Matrix& m)', a Matrix subclass `RowMajorMatrix', and
finally client code
    RowMajorMatrix a; float s = sum(a)
then a specialized version of `float sum(RowMajorMatrix& m)',
specifically optimized for RowMajorMatrix (and not any subclass
thereof!)  would be automatically be synthesized to handle the `float
s = sum(a)' call.  Of course, there are requirements for backup
strategies, involving continued use of dispatch-based versions for
cases where the exact type of an argument is unknown at compile time,
along with various other niceties described in the paper (as well
as a few that weren't -- write for details if you are interested.)

The most controversial aspects of the proposal are 

    1) It has surprisingly far-reaching effects on the C++ type system
        itself, including a few outright conflicts where static
        resolution and dynamic resolution can currently have differing
        effects.  (Of course, I argue that all such changes are for
        the best!  Just for one example, it provides a way for
        programmers to avoid `slicing' via customized call by value,
        which addresses a topic recently discussed here.)

    2) Even though several of the most important cases where
        customization can be applied are relatively straightforward to
        implement, getting the whole thing right doesn't look to be
        doable in an afternoon, a week, a month, or perhaps even a
        year. 

    3) It has consequences for C++ environments. Means for programmers
        to have some say about tradeoffs of potential code bloat for
        efficiency, prospects for integrating customization with
        parameterized types, and  impact on the separate
        compilation model all need to be addressed.

Many people apparently believe that there might be a means to get
similar effects with less impact on the language and current
compilers.  I remain unconvinced that this is possible. The fact that
both the SELF and TS (typed smalltalk) designers and implementers use
variations on customization extensively to obtain their impressive
performance improvements over standard OO compilation schemes lends
credence to the notion that customization can play an important
role in maintaining C++'s usual edge in efficiency compared to other
OO languages.

However, one viable alternative is to incorporate customization into a
metalanguage with associated tools comprising a C++ programmers
workbench, rather than into the language proper. I've been exploring
this too.

-Doug
--
Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2688
email: dl@g.oswego.edu            or dl@cat.syr.edu
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl