Xref: utzoo comp.lang.c++:5814 comp.lang.fortran:2718
Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!brutus.cs.uiuc.edu!lll-winken!muslix!jac
From: jac@muslix.llnl.gov (James Crotinger)
Newsgroups: comp.lang.c++,comp.lang.fortran
Subject: Re: inline and vectorization
Keywords: C++, vector syntax, vectorization.
Message-ID: <40827@lll-winken.LLNL.GOV>
Date: 7 Dec 89 16:00:52 GMT
References: <sZTKVau00VoLMDkUw5@andrew.cmu.edu>
Sender: usenet@lll-winken.LLNL.GOV
Reply-To: jac@muslix.UUCP (James Crotinger)
Followup-To: comp.lang.c++
Organization: Lawrence Livermore National Laboratory/UC Davis
Lines: 83


  I'd also like to see some discussion about using C++ (or other
languages with vector syntax) on vectorizing machines. We're going to
be getting C++ working here soon as well, and I'm wondering if it'll
be possible to write efficient vectorizing C++ code.

  I can forsee some problems. First of all, the C code cfront produces
when inlining is often quite complex looking.  I'm genuinly worried
that the C compilers that we are currently using will not be able to 
vectorize the  resulting code. If this is the case, then we lose big.

  However I also have other concerns, which are generic to languages
that support vector data types (ala CFT77 and Fortran 8x). Suppose
I have a vector type and the following code:

   vector A, B, C, D, E

   E = A + B*C     // meaning elementwise multiplication
   D = A - B*C 

This is the style of programming that the vector syntax promotes. My question
is, how smart will the compilers get. Will compilers evaluate the common
subexpression (B*C) once or twice? With the cfront model, the B*C stuff will
end up in seperate loops and it is highly unlikely that the compilers
subrexpression analizer will pick it up. I think what it boils down to is
this: will the compilers be able to do "loop jamming" on the loops that 
are implied by the vector syntax. Even in Fortran, if you coded:

   do i = 1, n

     E(i) = A(i) + B(i) * C(i)
   
   end do

   do i = 1, n

     D(i) = A(i) - B(i) * C(i)

   end do

the optimizer would not eliminate the common subexpression. But in fortran
you'd never do this (well, I'd never). The loops would be "jammed" together:

   do i = 1, n

     E(i) = A(i) + B(i) * C(i)
     D(i) = A(i) - B(i) * C(i)

   end do

Now the optimizer will optimize the heck out of this. Not only will the
B*C product only be evaluated once, but most likely A, B, and C will
only be fetched once. This latter point is not insignificant on a vector
machine. Thus even if you write:

    vector TMP = B * C

    E = A + TMP
    D = A - TMP

if the underlying C compiler (or fortran compiler) can't figure out
how to "jam" the loops itself, this will still be less efficient
than hand coding the loop (and it takes more memory). 

  Now then, you can't jam loops that are buried in seperate subroutine
calls.  This is an instance where inlining is clearly necessary for
full optimization. The common subexpression can't even be eliminated
unless you could convince the C compiler that the call to operator*(B,C) 
has no side effects! But I don't think CFRONT will inline this, since
it has a loop in it. Is this true? In spite of all the recent cautions
on inlining,  this is a case that is pretty clear to me. If you'd
otherwise code it out in long-hand, then you've got nothing to lose
from making it an inline function, and much to gain. Certainly, it is
much preferable to a macro. 

  So, can CFRONT inline this? Can it be fixed to do so? And if so, can
compilers be written to fully optimize this code by doing loop
jamming?  It almost seems like this type of optimization would be
easier to spot at the level of the C code, or the C++ code, rather
than after the code has been translated into some lower level
intermediate language, where most optimizers do their work.

  Jim