Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!sri-unix!teknowledge-vaxc!uw-beaver!tektronix!tekcrl!tekchips!stevev From: stevev@tekchips.UUCP Newsgroups: comp.arch,comp.lang.c Subject: Re: Optimization vs. the programmer Message-ID: <1176@tekchips.TEK.COM> Date: Thu, 9-Apr-87 11:51:13 EST Article-I.D.: tekchips.1176 Posted: Thu Apr 9 11:51:13 1987 Date-Received: Sat, 11-Apr-87 14:23:39 EST References: <479@danews.ATT.COM> <16294@sun.uucp> <484@danews.ATT.COM> Organization: Tektronix Inc., Beaverton, Or. Lines: 43 Xref: utgpu comp.arch:821 comp.lang.c:1536 Summary: optimization is in "eye" of the architecture In article <484@danews.ATT.COM>, lvc@danews.ATT.COM (Larry Cipriani) writes: > My preference is for a program (perhaps a phase of an optimizing > compiler) that would take source code and generate optimized *source* > code. Additionally, messages saying why the transformations are > better would be great to have. I believe such programs are available > for Fortran (or do these concentrate on vectorization) but I've never > seen one for C. Even thought a source level "optimization" is machine-independent in that its legality does not depend on the architecture, whether any particular optimization will improve the quality of the code cannot be known unless you know something about the architecture. The types of source-level optimizations done for a parallel architecture might be quite different than for a traditional uniprocessor. An example of an "optimization" that could turn out to decrease code quality is that of common subexpression elimination. The save the value of the common subexpression requires the allocation of an extra register. It's possible that the negative effect of tying up the extra register more than cancels that which is gained by eliminating the redundant computation. You need information about the particular architecture you are targetting in order to make this decision. Another example: I programmed on an architecture once in which the fastest way to produce certain constants--due to the archicture's addressing modes-- was to "unfold" the constant into two simpler ones (e.g., 1072 becomes "134 lsh 3"). Thus, even constant folding is not necessarily always the optimal thing to do. A microarchitecture that I was marginally familiar with has such a fast multiplier that the fastest way to perform a shift of more than some very small number of bits was to transform into an equivalent multiplication by a power of two. On such an architecture, a strength reduction that transforms a multiplication into a shift might DECREASE code quality. All this aside, IF you have a pretty good idea what kind of architecture you're targetting, source-level optimization can generally be quite effective. Steve Vegdahl Computer Research Lab Tektronix Labs Beaverton, Oregon