Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!uwm.edu!bionet!agate!stanford.edu!leland.Stanford.EDU!leland.Stanford.EDU!craig
From: craig@leland.Stanford.EDU (Craig Chambers)
Newsgroups: comp.object
Subject: Re: Run-time checks, Compile time Checks, and reliability
Keywords: type checking, high reliability, fault tolerance
Message-ID: <1991Apr16.003337.10685@leland.Stanford.EDU>
Date: 16 Apr 91 00:33:37 GMT
References: <1991Mar28.161307.6071@cbnewsh.att.com> <27F780E2.1872@tct.com> <1991Apr12.190418.13128@leland.Stanford.EDU> <526@eiffel.UUCP>
Sender: news@leland.Stanford.EDU (Mr News)
Reply-To: craig@self.stanford.edu
Organization: Stanford University
Lines: 54

In article <526@eiffel.UUCP>, bertrand@eiffel.UUCP (Bertrand Meyer) writes:
|> We looked at caching (a variant of which was sketched in Brad Cox's
|> January 1984 IEEE Software article), but realized quickly that
|> although it could make sense for a Smalltalk-like typeless language
|> to decrease the cost of dynamic binding, it was useless for a typed
|> language where type information made it possible to obtain a much more
|> efficient implementation .

I disagree that in-line caching is much less efficient than an
indirect function call-based implementation.  For our Self
implementation on a SPARC using an advanced form of in-line caching, a
dynamically-bound message (with no extra type information) takes 8
cycles plus 4 cycles per exact receiver type used at that call site,
e.g. a monomorphic send takes 12 cycles, a send with 2 equally-likely
receiver types takes on average 14 cycles, a send with 5
equally-likely receiver types takes on average 20 cycles.  (These
numbers are after initial start-up cache misses.)  This sequence
contains 1 load instruction, so there's the potential for more cycles
if there's a hardware cache miss on this load.

For C++ 2.0-style virtual function calls in the presence of multiple
inheritance, the code generated by most C++ implementations takes a
constant 7 cycles on a SPARC (8 cycles if in the presence of virtual
base classes), but with 3 loads (4 loads for virtual base classes), so
the likelihood of a hardware cache miss on some of the loads is
greater than for the in-line caching scheme.  This technique does not
interact well with other system services such as garbage collection
since it creates pointers into the middle of objects.  Some recent
work has developed techniques for reducing this cost to C++ 1.0 level
with a lot of link-time analysis of the program.

I don't know what Eiffel's techniques are; no one has described them
publically that I know of, and you seem hesitant to do so here.  But
assuming it's as fast as C++ 2.0's scheme (which would be a stretch
since Eiffel supports GC and other things that would make interior
pointers a pain), then you could say that your/C++'s scheme is between
2 and 3 times faster on average than Self's form of in-line caching.
However, these few cycles are a pretty low overhead compared to the
total cost of calling an external procedure versus inlining a short
operation.  My original posting was attempting to distinguish between
the difference in performance between a non-object-oriented language
in which type information allows inlining of short operations and
hence a big difference in performance, and an object-oriented language
in which type information allows a savings of a few cycles per call
but not the gross improvement of inlining.  So I stand by my claim
that static type information doesn't improve the performance of OO
programs anywhere near the extent that it improves the performance of
non-OO programs.  A little improvement, yes, but not much.  The right
approach to dramatically improving performance of OO programs is not
to add static interface-level type declarations but instead to pursue
other techniques to infer low-level type information that support full
inlining.

-- Craig Chambers