Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!aplcen!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!wuarchive!mit-eddie!uw-beaver!sumax!amc-gw!jimm
From: jimm@amc-gw.amc.com (Jim McElroy)
Newsgroups: comp.lang.c++
Subject: Re: inline and vectorization
Summary: "Smart Classes" for vector machines.
Keywords: C++, vector syntax, vectorization.
Message-ID: <1061@amc-gw.amc.com>
Date: 12 Dec 89 17:44:49 GMT
References: <sZTKVau00VoLMDkUw5@andrew.cmu.edu> <40827@lll-winken.LLNL.GOV> <2239@dataio.Data-IO.COM>
Organization: Applied Microsystems Corp.; Redmond, WA
Lines: 60


In article <40827@lll-winken.LLNL.GOV> jac@muslix.UUCP,
(James Crotinger) writes:

>   However I also have other concerns, which are generic to
>   languages that support vector data types (ala CFT77 and
>   Fortran 8x).  Suppose I have a vector type and the following
>   code:
>   
>      vector A, B, C, D, E
>      E = A + B*C     // meaning elementwise multiplication
>      D = A - B*C 
>   
>   This is the style of programming that the vector syntax
>   promotes.
   
The kind of optimizations that you are hoping for would be very
difficult for a language that tries to be general purpose. 
There are many examples of operations on classes that would not
accomplish the programmer's intended effect if the compiler
began omitting operations, saving intermediate results or even
altering the order of operations across statement boundaries. 
The compiler would have to know that the operations (*, +, etc.)
had no side effects that the programmer was counting on.

There is, however, a way.  Suppose we have a set of vector and
matrix classes that encapsulate the data values.  We then
implement the operations (*, +, etc.) so that the operations are
not carried out immediatly, but instead, a data structure is
built that records the desired operations and operands.  (A
"list of operations" structure.)  This structure would continue
to be added to until some operation is called for that would
carry a result value outside of the encapsulation -- that is,
outside of the set of vector and matrix classes.  For example, a
function such as "matrix.output()" would require that the
approprite value be actually calculated!).  Now that a computed
value is *really* needed, the big, hairy, optimizer-evaluator
function (that we have to figure out how to write) is called to
do the real work.  This function would analyze the "code" that
we have built in the "list of operations", study the data
dependencies, reorder the operations optimally, make necessary
copies of temporary results for later reuse, eliminate copies
that are not needed and so forth.  Now this optimizer-evaluator
function knows the problem domain and might also know the
peculiarities of the machine architecture that will finally
carry out the operations.  There is thus a very good opportunity
to exploit parallelism even in strongly machine-dependent ways.

It is likely that such a set of classes would need to be "tuned"
when the system is ported to a new architecture, but the
application using the classes would remain unchanged.  Yay.

How far can this idea be pushed?  Loop unrolling?

I hope somebody out there can work on this.


Jim McElroy