Xref: utzoo comp.lang.c:10419 comp.lang.fortran:692 Path: utzoo!attcan!uunet!mcvax!diku!njk From: njk@diku.dk (Niels J|rgen Kruse) Newsgroups: comp.lang.c,comp.lang.fortran Subject: Re: no noalias not negligible - a difference between C and Fortran - long Message-ID: <3845@diku.dk> Date: 25 May 88 20:12:08 GMT References: <54080@sun.uucp> Organization: DIKU, U of Copenhagen, DK Lines: 50 In article <54080@sun.uucp>, dgh%dgh@Sun.COM (David Hough) writes: >(...) > it appeared to still be written in Fortran. But faithful preservation of > Fortran semantics, including memory access patterns, was one of the goals > of the translation. Since nobody else commented on this ... If you didn't want to preserve memory access patterns so badly, you could have done some hand scheduling on the unrolled loop : (similarly simplified) daxpy(n, da, dx, dy ) double dx[], dy[], da; int n; { int i; double a,b,c,d; for (i = 0; i < n; i++) { /* * Compute 4 independent expressions into * registers a,b,c,d. */ a = dy[i] + da * dx[i]; b = dy[i+1] + da * dx[i+1]; c = dy[i+2] + da * dx[i+2]; d = dy[i+3] + da * dx[i+3]; /* * Store results back. */ dy[i] = a, dy[i+1] = b, dy[i+2] = c, dy[i+3] = d; } } This alleviates the constraints imposed on scheduling by potential aliasing. The first store can be scheduled as soon as the last load has completed. Given that a sufficient number of registers are available for scheduling loads into, enough time should be left before computations terminate to catch up on the stores. If the fortran compiler doesn't unroll more than 4 times, i see no reason why this should be slower than the rolled fortran version. On the other hand, i see no reason why the unrolled fortran version should be slower than the rolled version either, so what do i know. Niels J|rgen Kruse