Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!uwm.edu!bionet!agate!stanford.edu!leland.Stanford.EDU!leland.Stanford.EDU!craig From: craig@leland.Stanford.EDU (Craig Chambers) Newsgroups: comp.object Subject: Re: Run-time checks, Compile time Checks, and reliability Keywords: type checking, high reliability, fault tolerance Message-ID: <1991Apr16.003337.10685@leland.Stanford.EDU> Date: 16 Apr 91 00:33:37 GMT References: <1991Mar28.161307.6071@cbnewsh.att.com> <27F780E2.1872@tct.com> <1991Apr12.190418.13128@leland.Stanford.EDU> <526@eiffel.UUCP> Sender: news@leland.Stanford.EDU (Mr News) Reply-To: craig@self.stanford.edu Organization: Stanford University Lines: 54 In article <526@eiffel.UUCP>, bertrand@eiffel.UUCP (Bertrand Meyer) writes: |> We looked at caching (a variant of which was sketched in Brad Cox's |> January 1984 IEEE Software article), but realized quickly that |> although it could make sense for a Smalltalk-like typeless language |> to decrease the cost of dynamic binding, it was useless for a typed |> language where type information made it possible to obtain a much more |> efficient implementation . I disagree that in-line caching is much less efficient than an indirect function call-based implementation. For our Self implementation on a SPARC using an advanced form of in-line caching, a dynamically-bound message (with no extra type information) takes 8 cycles plus 4 cycles per exact receiver type used at that call site, e.g. a monomorphic send takes 12 cycles, a send with 2 equally-likely receiver types takes on average 14 cycles, a send with 5 equally-likely receiver types takes on average 20 cycles. (These numbers are after initial start-up cache misses.) This sequence contains 1 load instruction, so there's the potential for more cycles if there's a hardware cache miss on this load. For C++ 2.0-style virtual function calls in the presence of multiple inheritance, the code generated by most C++ implementations takes a constant 7 cycles on a SPARC (8 cycles if in the presence of virtual base classes), but with 3 loads (4 loads for virtual base classes), so the likelihood of a hardware cache miss on some of the loads is greater than for the in-line caching scheme. This technique does not interact well with other system services such as garbage collection since it creates pointers into the middle of objects. Some recent work has developed techniques for reducing this cost to C++ 1.0 level with a lot of link-time analysis of the program. I don't know what Eiffel's techniques are; no one has described them publically that I know of, and you seem hesitant to do so here. But assuming it's as fast as C++ 2.0's scheme (which would be a stretch since Eiffel supports GC and other things that would make interior pointers a pain), then you could say that your/C++'s scheme is between 2 and 3 times faster on average than Self's form of in-line caching. However, these few cycles are a pretty low overhead compared to the total cost of calling an external procedure versus inlining a short operation. My original posting was attempting to distinguish between the difference in performance between a non-object-oriented language in which type information allows inlining of short operations and hence a big difference in performance, and an object-oriented language in which type information allows a savings of a few cycles per call but not the gross improvement of inlining. So I stand by my claim that static type information doesn't improve the performance of OO programs anywhere near the extent that it improves the performance of non-OO programs. A little improvement, yes, but not much. The right approach to dramatically improving performance of OO programs is not to add static interface-level type declarations but instead to pursue other techniques to infer low-level type information that support full inlining. -- Craig Chambers