Path: utzoo!attcan!uunet!mcsun!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.lang.misc Subject: Re: Aggressive optimization Summary: What's bad is not clever implementation of specifications, but changing one implementation with another that supposedly implements the same specification. Message-ID: <2060@aber-cs.UUCP> Date: 21 Oct 90 17:18:51 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 220 I was writing: pcg> Aggressive optimization is the idea that restructuring a program *in pcg> the course of translation* is both feasible and advantageous, and pcg> desirable (cost effective). In article <1458@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes: chased> I'll grant that. pcg> My contention is that logic restructuring optimizations are more cost pcg> effective instead at the source level, whether automatic or manual, pcg> and that often automatic is not necessary, manual is sufficient. chased> Contention refuted, I think. chased> Monday, I went to a nice talk by Michael Wolf of Stanford U on chased> blocking (of matrix algorithms) for (register, cache, VM) locality chased> and parallelism. [ ... ] chased> These transformations are well understood, have clear conditions chased> for correctness, and though they could be performed at the source chased> level, doing them by hand is extremely tedious and far more likely chased> to be incorrect. What you are saying is that this guy has some technology to synthetize high performance matrix algorithms depending on the target architecture, and that such synthesis can be reliably done by program. This is excellent news, but does not necessarily mean that the technology is about aggressive optimization. The same might be used for clever code generation, which is absolutely OK. chased> Also, choices must be made based on cache structure and chased> array size; it can be profitable to make these choices chased> at RUN time. On the fly code generation, thunks. Excellent. Not related in any way to aggressive optimization. chased> The examples shown in the talk involved transformation of three chased> (nested) loops into five (nested) loops. I believe you'd call this chased> "aggressive". Frankly, I'd call them "crazy" if used as optimizations. Very clever and safe if used in a high level code generator. It all depends on the abstraction level at which you apply them. chased> Performance boost: a factor of 3 or 4 for common matrix algorithms chased> on two different machines. Note that the exact blocking chosen is chased> machine-dependent. The real point here is that the guy has a better algorithm for matrix operations than the standard one, so what he is doing is pattern matching for occurrence of it, and substituting the standard one for his better one. This begs the question: why not just write 'm * n' and let the code generator (or the library manager) synthetize the right algorithm? My answer: it is crazy to have programmers write specific algorithm implementations, and then have the compiler pattern match for the implementation, deduce the algorithm from the implementation, and then rewrite the implementation with another one that (supposedly) implements the same algorithm. IMNHO it is much more effective for example to use a source rewriting tool that scans the programmer code for implementations, and substitutes them with higher level constructs. The idea is that the source should be true; it should either contain an implementation that is literally translated *or* a specification that the compiler can implement as it pleases, and not something halfway. Everything else is a program with confusion in its abstraction levels. pcg> My main reason for surmising so is that automatic program analysis and pcg> rewriting is often far more difficult than code planning and synthesis, chased> This is quite true, if (1) you ignore economies of scale and (2) you If the economies of scale are "don't touch dusty decks", there is an unresolved argument as to whether it is more economical to rewrite manually or automatically the dusty decks in a form more amenable to clever code generation, or leave them alone and work on the generated code. I have seen too many dusty decks improve beyond recognition with a little manual or automatic restructuring (as Knuth says in vol. 1 of the ACP), so I tend to think that even for dusty decks explicit (automatic or manual) restructuring is better than aggressive optimization. If we are speaking of newly designed programs, then probably it is always cheaper to write the programs right (at the appropriate abstraction level) in the first place than to restructure them in the code generator. Unless of course there is not only the "dusty deck" problem, but the "dusty programmer" one. (the programmer can only write dusty decks, and cannot be taught to write programs in a higher level notation, e.g. using procedure calls :->). To those who think that I am joking here: well, no. I know physicists and mathematicians that program in Fortran without using subroutines. They don't understand them. I have even met a physicist that did not know about loops (or block datas) -- but at least he knew about the local version of the NAG library, so he initialized his matrixes with hundreds of assignment statements, and then called a library subroutine to do the real work. But then we would be discussing sociological issues, not computer science. chased> The work of a small number of people in this building chased> improves the performance of code written by anyone using our chased> compilers. No, it just gives the illusion that you can have faster rerunning of dusty decks without touching them. You would be doing a much better service to anyone if you wrote source program restructuring tools, and then clever code generators. pcg> and the benefits are not as often competitive with those of more pcg> limited and safer alternatives (source analysis and rewriting). Because once you have turned a low level implementation in a higher level one it has become magically more portable, and you can also have better confidence in the program correctness. I am far more confident of the correctness of an SQL query one page long than of the equivalent implementation in COBOL which probably is several dozen pages long. If the SQL code generator is clever it is also likely that the SQL version is faster too, and on a wider range of machines than the COBOL one. I would not trust as much for either portability of speed a COBOL compiler that would deduce little by little the query the program is supposed to implement, and then rewrite the COBOL program's logic into a different implementation for the same query. chased> If you get broken code from a compiler, the people who wrote it chased> probably want to know, and will almost certainly fix it in a future chased> release of their compiler. I believe this holds for DEC, FSF, HP, chased> IBM, Metaware, MIPS, Sun, and Tartan Labs. Maybe, but there is a thing called software physics that tells us that bugs are strictly related to complexity, and virtually the only way to drive down bugs is to reduce complexity, not to report them as they occur. chased> I don't know about you, but my time is more valuable than CPU time chased> on most computers. But this is a good argument to use higher level languages with clever code generators! Or to let expensive programmer's code be literally obeyed. I think that if the programmer's time is valuable the best solution is not to have him write low level implementations that are ignored by the compiler and restructured into different but equivalent implementations of the same specification, but to have the programmer write the specification and then the compiler or library manager generate or select the optimal implementation straight away. And if the programmer writes a specific implementation, then that must be because of a very good reason, and it ought to be respected. Dan Bernstein and Herman Rubin only present one side of the anti aggressive optimization argument -- that implementations should not be rewritten by the language translator. The other side, mine, is that specifications should be translated in the cleverest way possible. The aggressive optimizers believe that implementations should be translated in the cleverest way possible. The Bernstein/Rubin assumption is that programmers know what they are doing and write in a low level language which therefore ought to be translated literally; the Chase/Giles assumption is that programmers don't know what they are doing and still write in a low level language which therefore ought to be translated liberally. My argument is that programmers should use the language at the abstraction level suitable to them and their applications, and that low level languages ought to be translated literally and high level ones liberally, and that translating liberally implementations in low level languages is a monstrosity. When you have a high level specification and you generate clever code for it you only have to show that the synthetized implementation satisfies that specification; when you do aggressive optimization on an implementation written in a low level language you also have to show that you have correctly deduced the specification from the implementation you are overriding. This is a completely different and far more difficult task, and should not be done in a hidden way by the compiler, on safety grounds alone. It should be done IMNHO either by the programmer itself (manual algorithm reimplementation) or by automatic tools on the source, so that the new implementation is documented in it. And I think that in most cases restructuring should not be horizontal, i.e. from original inappropriate implementation to new better implementation (three nested loops to five nested loops), but from low level to high level (three nested loops to 'm * n'), and separately from high levle to low level ('m * n' expanded in the locally optimal version, maybe five nested loops, either by high level code generation or library selection). I am not against clever code generation for high level languages -- I am against compilers trying to second guess programmer's implementations in low level ones. I would like aggressive optimization advocates to contend with this point, not with the straw one that better implementations are better. The question is not: 1) Can a compiler restructure an algorithm's implementation into a supposedly equivalent supposedly faster implementation as a matter of course during code analysis and generation? but it is: 2) Isn't it rather more cost effective to write new programs in a more abstract and portable way so that clever code generation can be applied to them, and to restructure old programs in source form so that they are brought to the same more abstract level either manually or automatically? My answers are: 1) yes, it is possible to do it but with non neglibile loss in confidence. 2) yes, because it results in long term benefits as well as short term ones. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk