Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!exodus!rbbb.Eng.Sun.COM!chased From: chased@rbbb.Eng.Sun.COM (David Chase) Newsgroups: comp.lang.misc Subject: Aggressive optimization Summary: Program restructuring is well understood, and yields impressive benefits Message-ID: <1458@exodus.Eng.Sun.COM> Date: 17 Oct 90 18:54:26 GMT References: <65697@lanl.gov> Sender: news@exodus.Eng.Sun.COM Organization: Sun Microsystems, Mt. View, Ca. Lines: 106 > Aggressive optimization is the idea that restructuring a program *in > the course of translation* is both feasible and advantageous, and > desirable (cost effective). I'll grant that. > My contention is that logic restructuring optimizations are more cost > effective instead at the source level, whether automatic or manual, > and that often automatic is not necessary, manual is sufficient. Contention refuted, I think. Monday, I went to a nice talk by Michael Wolf of Stanford U on blocking (of matrix algorithms) for (register, cache, VM) locality and parallelism. These transformations are well understood, have clear conditions for correctness, and though they could be performed at the source level, doing them by hand is extremely tedious and far more likely to be incorrect. Also, choices must be made based on cache structure and array size; it can be profitable to make these choices at RUN time. The examples shown in the talk involved transformation of three (nested) loops into five (nested) loops. I believe you'd call this "aggressive". Performance boost: a factor of 3 or 4 for common matrix algorithms on two different machines. Note that the exact blocking chosen is machine-dependent. Other dependence-based transformations (scalarization, software pipelining, vectorization, parallelization) are also well understood, have clear conditions for correctness, and yield ample (often integer factors) increases in speed. > My main reason for surmising so is that automatic program analysis and > rewriting is often far more difficult than code planning and > synthesis, This is quite true, if (1) you ignore economies of scale and (2) you ignore those optimizations that are not expressible in the source language. The tremendous effort put into writing an optimizing compiler (and getting it correct) is paid back many times over in its later use. The work of a small number of people in this building improves the performance of code written by anyone using our compilers. > and the benefits are not as often competitive with those of more > limited and safer alternatives (source analysis and rewriting). You make two mistakes here: First, you act as if one must EITHER do manual optimization, OR do compiler optimization, but not BOTH. Second, compiler optimization IS more reliable than manual optimization, especially if the manual optimizer attempts to perform the same transformations (AND, the optimizer will optimize for different machines, unlike the manual optimizer). How often do you catch an optimizer bug, and often do you make a mistake hand-optimizing code? Consider, for example, a piece of code (written in C) I wrote demonstrating a new approach to solving a problem (it's the same example I've always used, because it's the only piece of code I've bothered that much over; usually, good algorithms, correctness, and compiler optimization are enough). First we have the algorithmic improvements -- there's about a factor of 20 - 1000 on the interesting problems. Then, we do some profiling to look for hot spots, and perform some optimizations by hand (that the compilers then did not do, but WILL do now, saving me the bother of doing it by hand and checking my work). Another factor of 10. More hand optimization, and it's 3 times faster yet. Now, that's a factor of thirty by hand (except that the compilers nowadays would have gotten a good chunk of that), but I started from a very careful implementation to be sure that I got it right. Even so, running the code through an optimizing compiler got me another factor of 1.67 on top of the factor of 30. It took the optimizer a couple of minutes to do that; it took me several weeks, spread out over a couple of years, and collaborating with other people using (and DEBUGGING -- I made a couple of mistakes along the way) the program to get the factor of 30. > More shortly: when -O[2345....] will not raise substantially the > probability of getting broken code in most compilers, when it will not > result in huge increases in compile time or space, and when it will > give substantially better results than hand tuned code, then I will > BELIEVE! If you get broken code from a compiler, the people who wrote it probably want to know, and will almost certainly fix it in a future release of their compiler. I believe this holds for DEC, FSF, HP, IBM, Metaware, MIPS, Sun, and Tartan Labs. The increases in compile time and space are regrettable, but that is a choice that you get to make. My computer, at least, doesn't need me to hold its hand while it compiles code, and I don't run the compiler at -O666 every day. Choice of optimization level during development, debugging, testing, and performance tuning is just a matter of allocating your time sensibly. (which is why I'm spending time replying to you, eh?) As for "substantially better" results, I believe that is the case already -- the compiler is certainly more often correct, and since I get to do *both* hand and compiler optimizations, I let the compiler go first, then I look for hot spots that it missed. I don't know about you, but my time is more valuable than CPU time on most computers. David Chase Sun Microsystems, Inc.