Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!uwm.edu!zaphod.mps.ohio-state.edu!wuarchive!uunet!zephyr.ens.tek.com!tektronix!sequent!mntgfx!timh From: timh@.mentorg.com (Tim Harvey @ APD x1381) Newsgroups: comp.lang.c++ Subject: Why is this program slow? / C++ versus C Performance Message-ID: <1991Jan12.040453.6887@mentor.com> Date: 12 Jan 91 04:04:53 GMT Sender: timh@mentor.com (Tim Harvey @ APD x1381) Distribution: usa Organization: engr Lines: 230 Peter Shirley posted the article (# 10640) "why is this program slow?" I found the topic he brought up (C++ versus C performance) and the discussion that followed useful to some work I'm doing and I wrote up the following summary. I'm posting it for the benefit of any one else who might be interested this topic. Of course, this summary isn't the "final word", so I hope people will post corrections, clarifications, and comments. Credit is due to those people upon whose contributions this summary is based: Peter Shirley, Griff Smith, Steve Simmons, Stephen Clamage, Steve Madere, Paul Vaughan, and Bruce Hoult. THE ISSUE --------- C++ is often charged with being an inherently slower language than C. A simple case was developed where a C++ program took almost twice as long (about 170%) to run as a similar C program. It illustrates one situation where C++ appears slower than C. The example involves nearly "equivalent" matrix addition C and C++ programs. Matrix B is repeatly added to matrix A in a FOR loop. EXAMPLE ------- /******************************************/ /* C Example */ /******************************************/ #include main() { unsigned long i ; double a[ 3 ], b[ 3 ] ; a[ 0 ] = a[ 1 ] = a[ 2 ] = 0.0 ; b[ 0 ] = 1.0 ; b[ 1 ] = 2.0 ; b[ 2 ] = 3.0 ; for ( i = 0 ; i < 1000000 ; i++ ) { a[ 0 ] = a[ 0 ] + b[ 0 ] ; a[ 1 ] = a[ 1 ] + b[ 1 ] ; a[ 2 ] = a[ 2 ] + b[ 2 ] ; } fprintf( stderr, "%lf %lf %lf\n", a[ 0 ], a[ 1 ], a[ 2 ] ) ; } /******************************************/ /* C++ Example */ /******************************************/ #include #include class Vector { friend ostream& operator<<( ostream&, Vector& ) ; friend Vector operator+ ( Vector&, Vector& ) ; protected: double data[ 3 ] ; public: Vector() ; Vector( double, double, double ) ; void operator=( Vector& ) ; }; inline Vector::Vector() { } inline Vector::Vector( double a, double b, double c ) { data[ 0 ] = a ; data[ 1 ] = b ; data[ 2 ] = c ; } inline void Vector::operator=( Vector& v ) { data[ 0 ] = v.data[ 0 ] ; data[ 1 ] = v.data[ 1 ] ; data[ 2 ] = v.data[ 2 ] ; } inline Vector& operator+( Vector& u, Vector& v ) { static Vector temp ; temp.data[ 0 ] = u.data[ 0 ] + v.data[ 0 ] ; temp.data[ 1 ] = u.data[ 1 ] + v.data[ 1 ] ; temp.data[ 2 ] = u.data[ 2 ] + v.data[ 2 ] ; return temp ; } ostream& operator<<( ostream& s, vector& t ) { return s << form( "%f %f %f", t.data[ 0 ], t.data[ 1 ], t.data[ 2 ] ) ; } main() { Vector a( 0.0, 0.0, 0.0 ) ; Vector b( 1.0, 2.0, 3.0 ) ; for ( int i = 0 ; i < 1000000 ; i++ ) { a = a + b ; } cout << a << endl ; } REASON FOR SLOWER PERFORMANCE ----------------------------- The C program is written in terms of basic operations that the compiler can easily optimize while the C++ version isn't. The C++ version ends up having to construct and copy a temporary class Vector object. This extra work accounts for the difference in run time. When the form a = a + b ; is used, since `a' is a structure, the `+' operator first has to store the sum in hidden (contained in the emitted cfront C code), temporary matrix storage, then the `=' operator must copy the contents of the temporary storage into `a'. This takes a string move at best, and a slow function call at worst. Another way of looking at it is to make the C program truely equivalent to the C++ version. The C program segment would then look something like the following to do what the C++ version actually does. for ( i = 0 ; i < 1000000 ; i++ ) { temp = calloc( 3, sizeof( *temp ) ) ; temp[ 0 ] = a[ 0 ] + b[ 0 ] ; temp[ 1 ] = a[ 1 ] + b[ 1 ] ; temp[ 2 ] = a[ 2 ] + b[ 2 ] ; a[ 0 ] = temp[ 0 ] ; a[ 1 ] = temp[ 1 ] ; a[ 2 ] = temp[ 2 ] ; free( temp ) ; } DISCUSSION ---------- The problem is that cfront isn't smart enough to construct the vector sum directly in the target variable 'a' instead of creating an intermediary temporary object. Given the two addition/assignment forms 1. a[ 0 ] = a[ 0 ] + b[ 0 ] ; 2. a[ 0 ] += b[ 0 ] ; modern compilers should identify the two cases as equivalent and generate the most efficient code, but cfront is not so fortunate. It treats both forms as different. In the first case, cfront will generates code that copies the result of the '+' operator twice on the way to its final destination. The first time a bitwise copy (the default copy constructor) is used, and the second time the defined '=' operator is used. The problem goes beyond cfront not being able to recognize equivalent forms. The sort of C code cfront generates can actually interfer with the downstream C compiler's ability of perform optimization. For example, something like int a = 0 ; for ( int i = 0 ; i < 1000000 ; i++ ) { a = a + 1 ; } when written in C is easily optimized by most C compilers to: a = 1000000 ; But the same operation can be written in C++ so that C code generated by cfront will "hide" the optimization opportunity and leave the downstream C compiler with no other choice but to generate unoptimized literal code for a lengthy loop construction. Eventually, the code optimization restrictions and problems of cfront will be resolved by one or more of the following: o cfront is modified to generate optimized C code, o cfront is changed to generate C code in a form that can be easily optimized by existing C compilers, o C compilers are developed that are better tuned to the the C code that cfront presently generates. o reliable, fully optimized C++ native code comilers are developed. WORKAROUND ---------- One way to work around the problem is to define a '+=' operator for the Vector class. There are many ways to do this. Perhaps the simplest is to define and use the '+=' operator as a member in the following way: class Vector { ... public: ... void operator+=( Vector& ) ; }; void Vector::operator+=( Vector& v ) { data[ 0 ] += v.data[ 0 ] ; data[ 1 ] += v.data[ 1 ] ; data[ 2 ] += v.data[ 2 ] ; } main() { ... for ( int i = 0 ; i < 1000000 ; i++ ) { a += b ; ... }