Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!aplcen!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!wuarchive!mit-eddie!uw-beaver!sumax!amc-gw!jimm From: jimm@amc-gw.amc.com (Jim McElroy) Newsgroups: comp.lang.c++ Subject: Re: inline and vectorization Summary: "Smart Classes" for vector machines. Keywords: C++, vector syntax, vectorization. Message-ID: <1061@amc-gw.amc.com> Date: 12 Dec 89 17:44:49 GMT References: <40827@lll-winken.LLNL.GOV> <2239@dataio.Data-IO.COM> Organization: Applied Microsystems Corp.; Redmond, WA Lines: 60 In article <40827@lll-winken.LLNL.GOV> jac@muslix.UUCP, (James Crotinger) writes: > However I also have other concerns, which are generic to > languages that support vector data types (ala CFT77 and > Fortran 8x). Suppose I have a vector type and the following > code: > > vector A, B, C, D, E > E = A + B*C // meaning elementwise multiplication > D = A - B*C > > This is the style of programming that the vector syntax > promotes. The kind of optimizations that you are hoping for would be very difficult for a language that tries to be general purpose. There are many examples of operations on classes that would not accomplish the programmer's intended effect if the compiler began omitting operations, saving intermediate results or even altering the order of operations across statement boundaries. The compiler would have to know that the operations (*, +, etc.) had no side effects that the programmer was counting on. There is, however, a way. Suppose we have a set of vector and matrix classes that encapsulate the data values. We then implement the operations (*, +, etc.) so that the operations are not carried out immediatly, but instead, a data structure is built that records the desired operations and operands. (A "list of operations" structure.) This structure would continue to be added to until some operation is called for that would carry a result value outside of the encapsulation -- that is, outside of the set of vector and matrix classes. For example, a function such as "matrix.output()" would require that the approprite value be actually calculated!). Now that a computed value is *really* needed, the big, hairy, optimizer-evaluator function (that we have to figure out how to write) is called to do the real work. This function would analyze the "code" that we have built in the "list of operations", study the data dependencies, reorder the operations optimally, make necessary copies of temporary results for later reuse, eliminate copies that are not needed and so forth. Now this optimizer-evaluator function knows the problem domain and might also know the peculiarities of the machine architecture that will finally carry out the operations. There is thus a very good opportunity to exploit parallelism even in strongly machine-dependent ways. It is likely that such a set of classes would need to be "tuned" when the system is ported to a new architecture, but the application using the classes would remain unchanged. Yay. How far can this idea be pushed? Loop unrolling? I hope somebody out there can work on this. Jim McElroy