Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!haven!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.lang.c Subject: Re: Fortran computes cosine 300 times faster than C (on Sun3) Keywords: Fortran, C, cosine, speed Message-ID: <16279@mimsy.UUCP> Date: 8 Mar 89 14:27:22 GMT References: <765@uceng.UC.EDU> Distribution: na Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 49 In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes: >I chanced upon a segment of code that runs approximately 300 times faster in >FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0) >(of course, on Sun4 the -f68881 flag was not used.) The results are similar >on both machines. Can anyone enlighten me on this bizzare result? `COS' is an intrisinc function in Fortran. This means that the compiler is required to know about it. It is typically provided as an external function in C, so that the compiler knows nothing of it. Thus: > for(i=0;i<262144;i++) > tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); makes the compiler call `cos' (262144*4) times, each time with the same argument, and multiply all those values together. The compiler does not `guess at' the function and assume that, since its value is not used the first 262143 times, eliminate the call, because `cos' might print `hello world'. On the other hand, given > do 10 i=1,262144 > tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5) >10 continue the Fortran compiler can be certain that COS(2.5) does nothing but compute cosines, and can change the code to TMP = 4.0 * COS(2.5) 10 CONTINUE possibly even replacing the COS(2.5) with the constant -.8011436155.... (Actually, since in both fragment, tmp is unused, both versions can elide the assignment to tmp and the C version can elide the four multiplies per iteration. It cannot, however, replace the four calls wtih a single call.) Now, if Sun had a pANS-conformant compiler, they could make do something like #define cos(x) __intrinsic_cos(x) and recognise calls to `__intrinsic_cos'. This sort of optimisation does have a real effect on real code (as opposed to silly examples like calling cos four times with the same constant in a loop that runs 262144 times, then throwing away the result). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris