Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!sri-unix!teknowledge-vaxc!uw-beaver!tektronix!tekcrl!tekchips!stevev
From: stevev@tekchips.UUCP
Newsgroups: comp.arch,comp.lang.c
Subject: Re: Optimization vs. the programmer
Message-ID: <1176@tekchips.TEK.COM>
Date: Thu, 9-Apr-87 11:51:13 EST
Article-I.D.: tekchips.1176
Posted: Thu Apr  9 11:51:13 1987
Date-Received: Sat, 11-Apr-87 14:23:39 EST
References: <479@danews.ATT.COM> <16294@sun.uucp> <484@danews.ATT.COM>
Organization: Tektronix Inc., Beaverton, Or.
Lines: 43
Xref: utgpu comp.arch:821 comp.lang.c:1536
Summary: optimization is in "eye" of the architecture

In article <484@danews.ATT.COM>, lvc@danews.ATT.COM (Larry Cipriani) writes:
> My preference is for a program (perhaps a phase of an optimizing
> compiler) that would take source code and generate optimized *source*
> code.  Additionally, messages saying why the transformations are
> better would be great to have.  I believe such programs are available
> for Fortran (or do these concentrate on vectorization) but I've never
> seen one for C.

Even thought a source level "optimization" is machine-independent in
that its legality does not depend on the architecture, whether any
particular optimization will improve the quality of the code cannot
be known unless you know something about the architecture.  The types
of source-level optimizations done for a parallel architecture might
be quite different than for a traditional uniprocessor.

An example of an "optimization" that could turn out to decrease code
quality is that of common subexpression elimination.  The save the value
of the common subexpression requires the allocation of an extra register.
It's possible that the negative effect of tying up the extra register
more than cancels that which is gained by eliminating the redundant
computation.  You need information about the particular architecture
you are targetting in order to make this decision.

Another example: I programmed on an architecture once in which the fastest
way to produce certain constants--due to the archicture's addressing modes--
was to "unfold" the constant into two simpler ones (e.g., 1072 becomes
"134 lsh 3").  Thus, even constant folding is not necessarily always the
optimal thing to do.

A microarchitecture that I was marginally familiar with has such a fast
multiplier that the fastest way to perform a shift of more than some very
small number of bits was to transform into an equivalent multiplication by
a power of two.  On such an architecture, a strength reduction that
transforms a multiplication into a shift might DECREASE code quality.

All this aside, IF you have a pretty good idea what kind of architecture
you're targetting, source-level optimization can generally be quite
effective.

		Steve Vegdahl
		Computer Research Lab
		Tektronix Labs
		Beaverton, Oregon