Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!amdcad!crackle!tim
From: tim@crackle.amd.com (Tim Olson)
Newsgroups: comp.lang.c
Subject: Re: Fortran computes cosine 300 times faster than C (on Sun3)
Message-ID: <24764@amdcad.AMD.COM>
Date: 8 Mar 89 18:30:12 GMT
References: <765@uceng.UC.EDU>
Sender: news@amdcad.AMD.COM
Reply-To: tim@amd.com (Tim Olson)
Distribution: na
Organization: Advanced Micro Devices, Inc. Sunnyvale CA
Lines: 85
Summary:
Expires:
Sender:
Followup-To:

In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes:
| I chanced upon a segment of code that runs approximately 300 times faster in
| FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0)
| (of course, on Sun4 the -f68881 flag was not used.) The results are similar
| on both machines. Can anyone enlighten me on this bizzare result?

Welcome to the world of benchmarking.

You can see what happened if you take a look at the assembly-language
generated by the compilers.  In the FORTRAN version, there is no call to
the cosine routine; only an empty loop remains.  This is because cosine
is a FORTRAN intrinsic which the compiler knows about.  Since you didn't
use any of the results of the cosine calls, the compiler was able to
eliminate it entirely as "dead code".

The C version had to keep the cosine function calls, because it isn't an
intrinsic function in K&R C, so the compiler knows nothing of what it
does (it may have side-effects).

To get more realistic numbers, you have to "fake out" the compiler, by
using the results of the calls:

________________________________________
/*
 * Compile using:
 *      cc -f68881 -O -o cosc cosc.c -lm.
 */

#include <math.h>

float bench()
{
	int i;
	float tmp;
	
	for(tmp=0.0,i=0;i<262144;i++)
        	tmp+=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);
	return tmp;
}

main()
{
	float tmp;

	tmp = bench();
}
________________________________________
c               f77 -f68881 -O -o cosf cosf.f
c
	real function bench()
        integer i
        real tmp


	tmp = 0.0
        do 10 i=1,262144
                tmp = tmp+cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5)
10      continue

	bench = tmp
	end


        program cosf
	real tmp1

	tmp1 = bench()
        end
________________________________________

On a Sun 4/110:

crackle49 time cosc
35.3u 0.5s 0:37 95% 0+144k 1+0io 2pf+0w
crackle50 time cosf
19.4u 0.3s 0:20 96% 0+232k 0+0io 0pf+0w

This difference is mainly due to floating-point math being performed in
double-precision in C, vs. single-precision in FORTRAN.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)