Xref: utzoo comp.lang.c++:5816 comp.lang.fortran:2719 Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!stat!stat.fsu.edu!mccalpin From: mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) Newsgroups: comp.lang.c++,comp.lang.fortran Subject: Re: inline and vectorization Message-ID: Date: 7 Dec 89 15:57:39 GMT References: <40827@lll-winken.LLNL.GOV> Sender: news@stat.fsu.edu Followup-To: comp.lang.c++ Organization: Supercomputer Computations Research Institute Lines: 88 In-reply-to: jac@muslix.llnl.gov's message of 7 Dec 89 16:00:52 GMT In article <40827@lll-winken.LLNL.GOV> jac@muslix.UUCP (James Crotinger) writes: > However I also have other concerns, which are generic to languages >that support vector data types (ala CFT77 and Fortran 8x). Suppose >I have a vector type and the following code: > vector A, B, C, D, E > E = A + B*C // meaning elementwise multiplication > D = A - B*C >This is the style of programming that the vector syntax promotes. This is also the style of programming that is appropriate to memory-to-memory vector machines (Cyber 205 and ETA-10), and (more importantly) for SIMD parallel machines like the Connection Machine. The code above runs at the same speed on the ETA-10 (for example) whether B*C is pre-calculated or not, since the extra multiply can be completely overlapped with the subtract in the second line. Of course coding for the ETA-10 is not an interesting issue for most of us these days, but I consider it very important to maintain a reasonable level of source code compatibility between codes for the Connection Machine and the Cray Y/MP and for other machines which have insufficient memory bandwidth to run vector operations in the streaming mode shown above. These machines include: Cray-2, Convex, IBM 3090, most (all?) Japanese supercomputers, Ardent Titan, as well as most machines on the drawing boards (names withheld since I am under non-disclosure on several of these.) Even the Cray X/MP and Y/MP benefit from reducing the memory traffic since this minimizes the bank conflicts suffered in a multi-processing environment. >My question >is, how smart will the compilers get. Will compilers evaluate the common >subexpression (B*C) once or twice? I don't know of *any* vectorizer/optimizer which will do this sort of optimization on vector quantities. Anyone from Cray care to comment on the current status of the Cray compiler on this code? It is *very* important that this capability be developed, since more and more machines are going to be memory-bandwidth-deficient in the next few years. >With the cfront model, the B*C stuff will >end up in separate loops and it is highly unlikely that the compilers >subrexpression analizer will pick it up. I think what it boils down to is >this: will the compilers be able to do "loop jamming" on the loops that >are implied by the vector syntax. Even in Fortran, if you coded: > do i = 1, n > E(i) = A(i) + B(i) * C(i) > end do > do i = 1, n > D(i) = A(i) - B(i) * C(i) > end do >the optimizer would not eliminate the common subexpression. But in fortran >you'd never do this (well, I'd never). The loops would be "jammed" together: > do i = 1, n > E(i) = A(i) + B(i) * C(i) > D(i) = A(i) - B(i) * C(i) > end do And converting array notation into this combined form requires a significant data dependency analysis.... I think that part of the problem is that vectorizers have been developed as stand-alone source-to-source translators, and the optimization aspect of this translation has been pretty minimal to date. >Now the optimizer will optimize the heck out of this. Not only will the >B*C product only be evaluated once, but most likely A, B, and C will >only be fetched once. This latter point is not insignificant on a vector >machine. Thus even if you write: > vector TMP = B * C > E = A + TMP > D = A - TMP >if the underlying C compiler (or fortran compiler) can't figure out >how to "jam" the loops itself, this will still be less efficient >than hand coding the loop (and it takes more memory). yep.... -- John D. McCalpin - mccalpin@masig1.ocean.fsu.edu mccalpin@scri1.scri.fsu.edu mccalpin@delocn.udel.edu