Path: utzoo!attcan!uunet!mcsun!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.lang.misc
Subject: Re: Aggressive optimization
Summary: What's bad is not clever implementation of specifications,
	but changing one implementation with another that supposedly
	implements the same specification.
Message-ID: <2060@aber-cs.UUCP>
Date: 21 Oct 90 17:18:51 GMT
Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Organization: Dept of CS, UCW Aberystwyth
	(Disclaimer: my statements are purely personal)
Lines: 220

I was writing:

pcg> Aggressive optimization is the idea that restructuring a program *in
pcg> the course of translation* is both feasible and advantageous, and
pcg> desirable (cost effective).

In article <1458@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase)
writes:

chased> I'll grant that.

pcg> My contention is that logic restructuring optimizations are more cost
pcg> effective instead at the source level, whether automatic or manual,
pcg> and that often automatic is not necessary, manual is sufficient.

chased> Contention refuted, I think.

chased> Monday, I went to a nice talk by Michael Wolf of Stanford U on
chased> blocking (of matrix algorithms) for (register, cache, VM) locality
chased> and parallelism. [ ... ]

chased> These transformations are well understood, have clear conditions
chased> for correctness, and though they could be performed at the source
chased> level, doing them by hand is extremely tedious and far more likely
chased> to be incorrect.

What you are saying is that this guy has some technology to synthetize high
performance matrix algorithms depending on the target architecture, and that
such synthesis can be reliably done by program. This is excellent news, but
does not necessarily mean that the technology is about aggressive
optimization. The same might be used for clever code generation, which is
absolutely OK.

chased> Also, choices must be made based on cache structure and
chased> array size; it can be profitable to make these choices
chased> at RUN time.

On the fly code generation, thunks. Excellent. Not related in any way to
aggressive optimization.

chased> The examples shown in the talk involved transformation of three
chased> (nested) loops into five (nested) loops.  I believe you'd call this
chased> "aggressive".

Frankly, I'd call them "crazy" if used as optimizations. Very clever and
safe if used in a high level code generator. It all depends on the
abstraction level at which you apply them.

chased> Performance boost: a factor of 3 or 4 for common matrix algorithms
chased> on two different machines.  Note that the exact blocking chosen is
chased> machine-dependent.

The real point here is that the guy has a better algorithm for matrix
operations than the standard one, so what he is doing is pattern matching
for occurrence of it, and substituting the standard one for his better one.

This begs the question: why not just write 'm * n' and let the code
generator (or the library manager) synthetize the right algorithm?

My answer: it is crazy to have programmers write specific algorithm
implementations, and then have the compiler pattern match for the
implementation, deduce the algorithm from the implementation, and then
rewrite the implementation with another one that (supposedly) implements the
same algorithm.

IMNHO it is much more effective for example to use a source rewriting tool
that scans the programmer code for implementations, and substitutes them
with higher level constructs. The idea is that the source should be true; it
should either contain an implementation that is literally translated *or* a
specification that the compiler can implement as it pleases, and not
something halfway. Everything else is a program with confusion in its
abstraction levels.

pcg> My main reason for surmising so is that automatic program analysis and
pcg> rewriting is often far more difficult than code planning and synthesis,

chased> This is quite true, if (1) you ignore economies of scale and (2) you

If the economies of scale are "don't touch dusty decks", there is an
unresolved argument as to whether it is more economical to rewrite manually
or automatically the dusty decks in a form more amenable to clever code
generation, or leave them alone and work on the generated code.

I have seen too many dusty decks improve beyond recognition with a little
manual or automatic restructuring (as Knuth says in vol. 1 of the ACP), so I
tend to think that even for dusty decks explicit (automatic or manual)
restructuring is better than aggressive optimization.

If we are speaking of newly designed programs, then probably it is always
cheaper to write the programs right (at the appropriate abstraction level)
in the first place than to restructure them in the code generator.

Unless of course there is not only the "dusty deck" problem, but the "dusty
programmer" one.  (the programmer can only write dusty decks, and cannot be
taught to write programs in a higher level notation, e.g. using procedure
calls :->).

    To those who think that I am joking here: well, no. I know physicists
    and mathematicians that program in Fortran without using subroutines.
    They don't understand them. I have even met a physicist that did not
    know about loops (or block datas) -- but at least he knew about the
    local version of the NAG library, so he initialized his matrixes with
    hundreds of assignment statements, and then called a library subroutine
    to do the real work.

But then we would be discussing sociological issues, not computer science.

chased> The work of a small number of people in this building
chased> improves the performance of code written by anyone using our
chased> compilers.

No, it just gives the illusion that you can have faster rerunning of dusty
decks without touching them. You would be doing a much better service to
anyone if you wrote source program restructuring tools, and then clever code
generators.

pcg> and the benefits are not as often competitive with those of more
pcg> limited and safer alternatives (source analysis and rewriting).

Because once you have turned a low level implementation in a higher level
one it has become magically more portable, and you can also have better
confidence in the program correctness.

I am far more confident of the correctness of an SQL query one page long
than of the equivalent implementation in COBOL which probably is several
dozen pages long. If the SQL code generator is clever it is also likely that
the SQL version is faster too, and on a wider range of machines than the
COBOL one.

I would not trust as much for either portability of speed a COBOL compiler
that would deduce little by little the query the program is supposed to
implement, and then rewrite the COBOL program's logic into a different
implementation for the same query.

chased> If you get broken code from a compiler, the people who wrote it
chased> probably want to know, and will almost certainly fix it in a future
chased> release of their compiler.  I believe this holds for DEC, FSF, HP,
chased> IBM, Metaware, MIPS, Sun, and Tartan Labs.  

Maybe, but there is a thing called software physics that tells us that bugs
are strictly related to complexity, and virtually the only way to drive down
bugs is to reduce complexity, not to report them as they occur.

chased> I don't know about you, but my time is more valuable than CPU time
chased> on most computers.

But this is a good argument to use higher level languages with clever code
generators! Or to let expensive programmer's code be literally obeyed.

I think that if the programmer's time is valuable the best solution is not
to have him write low level implementations that are ignored by the compiler
and restructured into different but equivalent implementations of the same
specification, but to have the programmer write the specification and then
the compiler or library manager generate or select the optimal
implementation straight away. And if the programmer writes a specific
implementation, then that must be because of a very good reason, and it
ought to be respected.

Dan Bernstein and Herman Rubin only present one side of the anti aggressive
optimization argument -- that implementations should not be rewritten by the
language translator. The other side, mine, is that specifications should be
translated in the cleverest way possible.  The aggressive optimizers believe
that implementations should be translated in the cleverest way possible.

The Bernstein/Rubin assumption is that programmers know what they are doing
and write in a low level language which therefore ought to be translated
literally; the Chase/Giles assumption is that programmers don't know what
they are doing and still write in a low level language which therefore ought
to be translated liberally.

My argument is that programmers should use the language at the abstraction
level suitable to them and their applications, and that low level languages
ought to be translated literally and high level ones liberally, and that
translating liberally implementations in low level languages is a monstrosity.

When you have a high level specification and you generate clever code for it
you only have to show that the synthetized implementation satisfies that
specification; when you do aggressive optimization on an implementation
written in a low level language you also have to show that you have
correctly deduced the specification from the implementation you are
overriding. This is a completely different and far more difficult task, and
should not be done in a hidden way by the compiler, on safety grounds alone.

It should be done IMNHO either by the programmer itself (manual algorithm
reimplementation) or by automatic tools on the source, so that the new
implementation is documented in it. And I think that in most cases
restructuring should not be horizontal, i.e. from original inappropriate
implementation to new better implementation (three nested loops to five
nested loops), but from low level to high level (three nested loops to 'm *
n'), and separately from high levle to low level ('m * n' expanded in the
locally optimal version, maybe five nested loops, either by high level code
generation or library selection).

I am not against clever code generation for high level languages -- I am
against compilers trying to second guess programmer's implementations in low
level ones. I would like aggressive optimization advocates to contend with
this point, not with the straw one that better implementations are better.

The question is not:

1) Can a compiler restructure an algorithm's implementation into a supposedly
equivalent supposedly faster implementation as a matter of course during
code analysis and generation?

but it is:

2) Isn't it rather more cost effective to write new programs in a more
abstract and portable way so that clever code generation can be applied to
them, and to restructure old programs in source form so that they are
brought to the same more abstract level either manually or automatically?

My answers are:

1) yes, it is possible to do it but with non neglibile loss in confidence.

2) yes, because it results in long term benefits as well as short term ones.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk