Xref: utzoo comp.lang.c:10254 comp.lang.fortran:675 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!bloom-beacon!oberon!cit-vax!elroy!ames!ncar!oddjob!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.lang.c,comp.lang.fortran Subject: Re: no noalias not negligible (long) Message-ID: <11608@mimsy.UUCP> Date: 21 May 88 04:18:12 GMT References: <54080@sun.uucp> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 104 In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes: >The results for double precision linpack on Sun-4 using SunOS 4.0 and >Fortran 1.1 were [edited to just `rolled' case]: >Fortran 1080 KFLOPS >C 850 KFLOPS > subroutine daxpy(n,da,dx,dy) > doubleprecision dx(1),dy(1),da > integer i,n Incidentally, as we have just seen in comp.arch, this Fortran version is illegal: it should declare dx and dy as integer n double precision dx(n), dy(n) > do 30 i = 1,n > dy(i) = dy(i) + da*dx(i) > 30 continue > return > end >The corresponding rolled C code could be written with a for loop >daxpy(n, da, dx, dy) > double dx[], dy[], da; > int n; >{ > int i; > > for (i = 0; i < n; i++) { > dy[i] = dy[i] + da * dx[i]; I suggest dy[i] += da * dx[i]; as it is easier to understand. (In a reasonably optimal C compiler it should produce the same code.) > } >} > >but [the] Sun compilers ... won't unroll [these] loops.... [Hand unrolling >helped but not as much as expected.] >Investigation revealed that the reason had to do with noalias: [the >Fortran [version is] defined by the Fortran standard to be "noalias", >meaning a compiler may optimize code based on the assumption that [dy >and dx are distinct]. [X3J11's `noalias' proposal was deleted for various reasons including] >3) optimizing compilers should be able to figure out if aliasing >exists, which is definitely false in a separate compilation environment >(unless you want the linker to recompile everything, in which case the >linker is the compiler, and you're back to no separate compilation). This is not quite right: The linker is to be the *code generator*, not the *compiler*. Code generation is a (relatively) small subset of the task of compilation. Naturally, a code-generating linker will take longer to run than a simple linking linker, which discourages this somewhat. The usual solution is to generate code in the compiler proper only when it is told not to optimise. >Anyway there is no portable way in draft ANSI C to say "this pointers >are guaranteed to have no aliases". >... you don't dare load dx[i+1] before you store dy[i] if there is >any danger that they point to the same place. True. >What is to be done? Ignore it. (Unsatisfactory.) Provide code-generating linkers. (Good idea but hard to do.) Provide `unsafe' optimisation levels. (Generally a bad idea, but easier than code generation at link time, and typically produces faster compile times.) Provide `#pragma's. Some people claim that a pragma is not allowed to declare such semantics as volatility or lack of aliasing; I disagree. Short of the code-generating linker, with aliasing and register allocation computed at `link' time, this seems to me the best solution. /* * Double precision `ax + y' (d a*x plus y => daxpy). */ void daxpy(n, a, x, y) register int n; register double a, *x, *y; #pragma notaliased(x, y) /* or #pragma separate, or #pragma distinct, or... */ { while (--n >= 0) *y++ += a * *x++; } to write it in C-idiom-ese. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris