Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!exodus!rbbb.Eng.Sun.COM!chased
From: chased@rbbb.Eng.Sun.COM (David Chase)
Newsgroups: comp.lang.misc
Subject: Aggressive optimization
Summary: Program restructuring is well understood, and yields impressive benefits
Message-ID: <1458@exodus.Eng.Sun.COM>
Date: 17 Oct 90 18:54:26 GMT
References: <PCG.90Oct13114804@odin.cs.aber.ac.uk> <65697@lanl.gov> <PCG.90Oct16160125@teachc.cs.aber.ac.uk>
Sender: news@exodus.Eng.Sun.COM
Organization: Sun Microsystems, Mt. View, Ca.
Lines: 106

> Aggressive optimization is the idea that restructuring a program *in
> the course of translation* is both feasible and advantageous, and
> desirable (cost effective).

I'll grant that.

> My contention is that logic restructuring optimizations are more cost
> effective instead at the source level, whether automatic or manual,
> and that often automatic is not necessary, manual is sufficient.

Contention refuted, I think.

Monday, I went to a nice talk by Michael Wolf of Stanford U on
blocking (of matrix algorithms) for (register, cache, VM) locality and
parallelism.  These transformations are well understood, have clear
conditions for correctness, and though they could be performed at the
source level, doing them by hand is extremely tedious and far more
likely to be incorrect.  Also, choices must be made based on cache
structure and array size; it can be profitable to make these choices
at RUN time.  The examples shown in the talk involved transformation
of three (nested) loops into five (nested) loops.  I believe you'd
call this "aggressive".

Performance boost: a factor of 3 or 4 for common matrix algorithms on
two different machines.  Note that the exact blocking chosen is
machine-dependent.

Other dependence-based transformations (scalarization, software
pipelining, vectorization, parallelization) are also well understood,
have clear conditions for correctness, and yield ample (often integer
factors) increases in speed.

> My main reason for surmising so is that automatic program analysis and
> rewriting is often far more difficult than code planning and
> synthesis,

This is quite true, if (1) you ignore economies of scale and (2) you
ignore those optimizations that are not expressible in the source
language.  The tremendous effort put into writing an optimizing
compiler (and getting it correct) is paid back many times over in its
later use.  The work of a small number of people in this building
improves the performance of code written by anyone using our
compilers.

> and the benefits are not as often competitive with those of more
> limited and safer alternatives (source analysis and rewriting).

You make two mistakes here:
First, you act as if one must EITHER do manual optimization, OR do
compiler optimization, but not BOTH.

Second, compiler optimization IS more reliable than manual
optimization, especially if the manual optimizer attempts to perform
the same transformations (AND, the optimizer will optimize for
different machines, unlike the manual optimizer).  How often do you
catch an optimizer bug, and often do you make a mistake
hand-optimizing code?

Consider, for example, a piece of code (written in C) I wrote
demonstrating a new approach to solving a problem (it's the same
example I've always used, because it's the only piece of code I've
bothered that much over; usually, good algorithms, correctness, and
compiler optimization are enough).  First we have the algorithmic
improvements -- there's about a factor of 20 - 1000 on the interesting
problems.  Then, we do some profiling to look for hot spots, and
perform some optimizations by hand (that the compilers then did not
do, but WILL do now, saving me the bother of doing it by hand and
checking my work).  Another factor of 10.  More hand optimization, and
it's 3 times faster yet.

Now, that's a factor of thirty by hand (except that the compilers
nowadays would have gotten a good chunk of that), but I started from a
very careful implementation to be sure that I got it right.  Even so,
running the code through an optimizing compiler got me another factor
of 1.67 on top of the factor of 30.  It took the optimizer a couple of
minutes to do that; it took me several weeks, spread out over a couple
of years, and collaborating with other people using (and DEBUGGING --
I made a couple of mistakes along the way) the program to get the
factor of 30.

> More shortly: when -O[2345....] will not raise substantially the
> probability of getting broken code in most compilers, when it will not
> result in huge increases in compile time or space, and when it will
> give substantially better results than hand tuned code, then I will
> BELIEVE!

If you get broken code from a compiler, the people who wrote it
probably want to know, and will almost certainly fix it in a future
release of their compiler.  I believe this holds for DEC, FSF, HP,
IBM, Metaware, MIPS, Sun, and Tartan Labs.  

The increases in compile time and space are regrettable, but that is a
choice that you get to make.  My computer, at least, doesn't need me
to hold its hand while it compiles code, and I don't run the compiler
at -O666 every day.  Choice of optimization level during development,
debugging, testing, and performance tuning is just a matter of
allocating your time sensibly. (which is why I'm spending time
replying to you, eh?)  As for "substantially better" results, I
believe that is the case already -- the compiler is certainly more
often correct, and since I get to do *both* hand and compiler
optimizations, I let the compiler go first, then I look for hot spots
that it missed.  I don't know about you, but my time is more valuable
than CPU time on most computers.

David Chase
Sun Microsystems, Inc.