Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.lang.misc
Subject: Re: Definition of "Aggressive optimization"
Message-ID: <PCG.90Nov11175811@odin.cs.aber.ac.uk>
Date: 11 Nov 90 17:58:11 GMT
References: <numerous> <1990Nov6.183955.394@lpi.liant.com>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 140
Nntp-Posting-Host: odin
In-reply-to: rcg@lpi.liant.com's message of 6 Nov 90 18:39:55 GMT

On 6 Nov 90 18:39:55 GMT, rcg@lpi.liant.com (Rick Gorton) said:

rcg> Seriously, I want to know language (and furthermore,
rcg> what compiler) is being used here that is being
rcg> used as the basis for some of the discussion going on:

barmar> So aggressive optimization refers to optimization that does
barmar> things like interprocedural flow analysis?  Are loop unrolling,
barmar> strength reduction, and common subexpression elimination
barmar> aggressive or not?

pcg> They are all aggressive. Even substituting a shift for a
pcg> multiplication or division by a power of two may be considered
pcg> aggressive in certain languages where the properties of the two are
pcg> not precisely identical.

rcg> Huh?  If the language spec says that shifting is not the same as
rcg> multiplying for powers of two, then it is not an AGGRESSIVE
rcg> optimization to convert the multiply into a shift, rather an
rcg> ERRONEOUS one.  And not just at the intermediate language level.

Well, I once called aggressive an optimization that involves considering
the meaning of multiple language primitives. Just to continue the
contrived example of shift vs. multiplication: if the language says that
*in every relevant case* shift and multiplication by 2^N, then they are
in effect the same thing (for those relevant cases) and can be
translated in the same way.

But: if the language for example says that shift and multiplication by 2^N
ar eonly equivalent if both operands are positive, and at least one of
the operands is a variable, then the correctness of the optimization
depends on analysis of the flow of values, and is thus aggressive.

barmar> The only optimizers that are independent of the primitive
barmar> operators are peephole optimizers.

pcg> Yep, yep. Long live peephole optimizers!

rcg> Yes, peephole optimizers are usually independent of the
rcg> intermediate language, but in the case specified (about no shifting
rcg> instead of multiplying) It is REQUIRED that the peepholer know some
rcg> things about the language involved, if it doesn't have hooks into
rcg> the intermediate language or symbol table to get storage
rcg> characteristics/requirements, etc.

Why? The peepholer need only know the properties of the machine, not of
the language being compiled. In other words it can rewrite the generated
code in any way it likes, independently of the source language, as long
as it can prove that in the given context the two machine instruction
sequences are equivalent.

Note that a peephole optimizer can be aggressive, and thus more or less
error prone -- if the machine architecture gives simple rules for
instruction sequence equivalencing then probably the peephole rewriter
will be less buggy and more aggressive than one that has to deal with a
more difficult architecture. Note that IMNHO aggressive peephole
optimizers are not a good bet either.

     [ ... a clever code generation example ... ]

rcg> A good peepholer is capable of doing all this, and much more.  In
rcg> fact, there are times when peepholers can (and will) monkey with
rcg> loops, and pull things out of them that don't need to be in them.
rcg> But none of this can be done if the language specifies that this
rcg> behavior is incorrect.

Assuming that the generated machine code is correct, the source language
does not matter. Of course the peepholer can work better if it makes use
of contextual information derived from the source, but this is not
necessary or desirable.

pcg> I intended to say that the code that effects the transformation adds to
pcg> the size and complexity of the compiler and therefore affects the
pcg> compiler's reliability....You have a large amount of hairy code in the
pcg> compiler's bowels, and this is baaaad.

rcg> Having large amounts of difficult code is only bad if you find
rcg> bugs or limitations in it that require major rewrites.

Ahem. Software physics... It can also be bad if the bugs affect the
correctness of every other program in a system. A code generator is not
just difficult, it is critical. Maybe the most critical code in a
system.

You know, even little trivial bugs, than can be corrected with a one
line patch, in a code generator can cause major problems: not in the
compiler, which still works 99%, but in all the compiled programs, which
may stop working at all in unanticipated ways. Do we want to increase
the chances for any type of bug in the code generator? Maybe, if the
programmers ("dusty programmer" problem) can be expected to be even
buggier.

rcg> In this case, we are no longer talking about aggressive
rcg> compilation, rather software [re]design, engineering, and
rcg> development issues.

Yep, precisely. But the motivation for doing aggressive optimization in
the first place is precisely engineering, i.e. cost effectiveness.
Aggressive optimization is not *necessary* for correctness or
functionality, only for reducing one of the costs of a program.

The probability of increasing the problems in a compiler is a software
engineering issue, just as the probability of increasing the speed at
which programs execute, just as are the consequences of using a code
generator level aggressive optimizer versus a source level rewriter.
These are all difficult sw engineering issues.

rcg> So how about some POSITIVE discussions about what kind of features
rcg> good software productivity tools have/need, instead of this
rcg> continuous optimization-bashing festival?

Note that I am not against optimization (intended as translation to
optimal code) at all, and not even against aggressive optimization; the
former is IMNHO very important, and the latter is too, as long as it is
performed on the source, or however outside the compiler.

All the difficult problems of aggressive optimization, stemming from
program logic analysis and rewriting, are very interesting and
important, only I don't think that the best place where to solve them is
the code generator.

Source level analysis and improvement is, in the EDP world, a very
important and commercially profitable activity. EDP organizations spend
immense amounts of money each year in cleaning, maintaining, converting
dusty decks.

What I would like to see is more of this going on in the scientific
sector as well. But maybe many numerical analysts (because it is about
these that we have been speaking mostly) are more stuffy than EDP
people, and they resist any idea of learning anything beyond what they
managed to get in their undergraduate Fortran or C classes (I'm not a
programmer! Fortran II is perfectly adequate for my research, and
besides I am too busy!).

With Herman Rubin as a counterexample, just to to confirm the rule,
maybe.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk