Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!maverick.ksu.ksu.edu!zaphod!pacific.mps.ohio-state.edu!linac!att!att!westmark!mole-end!mat From: mat@mole-end.UUCP (Mark A Terribile) Newsgroups: comp.lang.c++ Subject: Re: Why is this program slow? / C++ versus C Performance Summary: 'tain't just a question of SMARTS, y'know, but of knowin' the rules ... Message-ID: <470@mole-end.UUCP> Date: 14 Jan 91 01:28:48 GMT References: <1991Jan12.040453.6887@mentor.com> Distribution: usa Organization: mole-end--private system. admin: mole-end!newtnews Lines: 107 > ... Of course, this summary isn't the "final word", so I hope people will > post corrections, clarifications, and comments. > THE ISSUE > --------- > A simple case was developed where a C++ program took almost twice as > long (about 170%) to run as a similar C program. It illustrates one > situation where C++ appears slower than C. > REASON FOR SLOWER PERFORMANCE > ----------------------------- > The C program is written in terms of basic operations that the compiler > can easily optimize while the C++ version isn't. The C++ version ends up > having to construct and copy a temporary class Vector object. This extra > work accounts for the difference in run time. > DISCUSSION > ---------- > The problem is that cfront isn't smart enough to construct the vector sum > directly in the target variable 'a' instead of creating an intermediary > temporary object. Well, I'm not so sure of that. There is a guarantee that the temporary objects WILL be created so that the side effects of constructors will occur-- EXCEPT in a very few circumstances involving a copy constructor used as the second constructor in an initialization. The problem, as coded, explicitly excludes any of these circumstances. (It's not that difficult to change so that one of the circumstances occurs, as we have seen.) In theory, a `sufficiently smart' compiler could determine that there are no side effects and computation can proceed without creating the temporary, and since the function is expanded inline, this would be one of the easier cases for the compiler to detect. (Without the inlining, the optimization and code generation would have to take place AFTER symbol resolution, that is, in the middle of the linking process.) > 1. a[ 0 ] = a[ 0 ] + b[ 0 ] ; > 2. a[ 0 ] += b[ 0 ] ; > > modern compilers should identify the two cases as equivalent and generate the > most efficient code, but cfront is not so fortunate. ... Modern compilers which have semantics for the type built-in, yes. For C++, no. Perhaps it will one day be possible to specify more about the operator semantics, but for a language like C++, I suspect that awaits some theory of programming language semantics. > The problem goes beyond cfront not being able to recognize equivalent forms. > The sort of C code cfront generates can actually interfer with the > downstream C compiler's ability of perform optimization. ... C++ asks more of a compiler, and for the near future, that will mean that certain things that can be optimized in C-like code cannot be optimized when they cross the boundaries provided by C++'s ways of organizing programs. If the low-level performance of a type is important, then the type must be designed and written carefully. It's true that we've come to rely a lot on optimizing compilers and less on how we code, but we can't rely on compilers to optimize across separate compilation boundaries, nor to replace weak expensive algorithms with good ones (even if code hoisting and strength reduction can sometimes have that effect). We have to seek to reduce wasted motion in more-or-less machine independent ways. Such ways exist (e.g. the value-building constructor in the return statement). > Eventually, the code optimization restrictions and problems of cfront > will be resolved by one or more of the following: I would say rather `ameliorated,' since it's not clear to me that a compiler, given seperate code generation, can analyze away all the extra motion that C++'s model seems (in the absence of optimization) to imply. > o cfront is modified to generate optimized C code, > > o cfront is changed to generate C code in a form that can be > easily optimized by existing C compilers, > > o C compilers are developed that are better tuned to the the > C code that cfront presently generates. > > o reliable, fully optimized C++ native code comilers are > developed. I believe that another step is needed: code generation and optimization AFTER symbol resolution. Unfortunately, it will be years before we see this, even if one language or machine vendor proves conclusively its worth. > WORKAROUND > ---------- > > One way to work around the problem is to define a '+=' operator for the > Vector class. ... Another is to reduce by at least half the wasted motion in the operator+() , by using the constructor value builder. With inlines, this may clue the compiler that all the temporaries can be eliminated. Mark T. !FunkyStuff -- (This man's opinions are his own.) From mole-end Mark Terribile