Xref: utzoo comp.lang.c++:5837 comp.lang.fortran:2727 Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uwm.edu!lll-winken!muslix!jac From: jac@muslix.llnl.gov (James Crotinger) Newsgroups: comp.lang.c++,comp.lang.fortran Subject: Re: inline and vectorization Message-ID: <40997@lll-winken.LLNL.GOV> Date: 8 Dec 89 23:03:30 GMT References: <40827@lll-winken.LLNL.GOV> Sender: usenet@lll-winken.LLNL.GOV Reply-To: jac@muslix.UUCP (James Crotinger) Followup-To: comp.lang.c++ Organization: Lawrence Livermore National Laboratory/UC Davis Lines: 53 In article mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) writes: >This is also the style of programming that is appropriate to >memory-to-memory vector machines (Cyber 205 and ETA-10), and (more >importantly) for SIMD parallel machines like the Connection Machine. >The code above runs at the same speed on the ETA-10 (for example) >whether B*C is pre-calculated or not, since the extra multiply can be >completely overlapped with the subtract in the second line. > I guess I find that a bit hard to buy. The X-MP also does chaining, but the above example runs 33% slower on our X-MP when written using Cray Fortran's vector notation than it does when the loop is written out explicitly, with both loops loops jammed into one. Interestingly, on the Cray 2, which does no chaining, the vector style version is only 20% slower. I suspect that the memory bandwidth is what's really the killer here. In the version which is written out as one jammed loop, the Cray should do the following: loop: load 64 elements of A load 64 elements of B load 64 elements of C calculate E = A + B*C (for 64 elements) calculate D = A - B*C (ditto) store E (ditto) store D (ditto) goto loop (with appropriate logic to end the loop). The savings of not having to go out to memory to get A, B, and C twice are not small at all. Furthermore, on the Cray 2 stuff like storing E can be overlapped with the calculation of D... > >>My question >>is, how smart will the compilers get. Will compilers evaluate the common >>subexpression (B*C) once or twice? > >I don't know of *any* vectorizer/optimizer which will do this sort of >optimization on vector quantities. Anyone from Cray care to comment on >the current status of the Cray compiler on this code? > >It is *very* important that this capability be developed, since more >and more machines are going to be memory-bandwidth-deficient in the >next few years. > Exactly. >John D. McCalpin - mccalpin@masig1.ocean.fsu.edu > mccalpin@scri1.scri.fsu.edu > mccalpin@delocn.udel.edu Jim