Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!think.com!barmar From: barmar@think.com (Barry Margolin) Newsgroups: comp.lang.misc Subject: Re: Aggressive optimization Message-ID: <1990Nov2.001315.28208@Think.COM> Date: 2 Nov 90 00:13:15 GMT References: <1990Oct30.105241.9733@tkou02.enet.dec.com> <1849@exodus.Eng.Sun.COM> Sender: news@Think.COM Organization: Thinking Machines Corporation, Cambridge MA, USA Lines: 195 In article pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >A program in some language, at any level of abstraction, defines an >higher level virtual machine than that offered by the language. The >primitives of the virtual machine defined by a program (most obviously, >but only, its procedures and macros) are actually composed out of the ^^^^^^^^ Should this have been "but not only"? >primitives and combinators of the language, and their meaning is defined >according to the algebra for these as given in the language definition. > >Aggressive optimization is mapping from the virtual machine defined by >the program (not by the language!) to the real machine *in the code >generator*. So aggressive optimization refers to optimization that does things like interprocedural flow analysis? Are loop unrolling, strength reduction, and common subexpression elimination aggressive or not? >This is in way of principle possible, because the code generator can >indeed recognize combinations of primitive operators and rewrite them to >other combinations that supposedly map better onto the target real >machine, as long as the two are equivalent under the algebra of the >language definition, in that particular context. Code generators, and certainly unagressive optimizers, are almost always required to "recognize combinations of primitive operators". The only optimizers that are independent of the primitive operators are peephole optimizers. For instance, in a language with a "+" operator that accepts arguments of either float or integer, and with conversion operators float(x) and integer(x), the compiler would be expected to emit different code for float(x)+float(y) than for integer(x)+integer(y). To do this it must recognize the combination of the conversion operators with the "+" operator. To some extent a compiler is simply an optimizer for an interpreter. The dividing lines between code generation, ordinary optimization, and aggressive optimization are very fuzzy. >The problems are: > >Is this desirable? > > No, because the rewriting algebra of the language may be poorly > defined, or poorly understood, and thus it may be difficult to rely > on the ability of the code generator to rewrite combinations of > language primitives in semantics preserving ways. This is > particularly true for languages like C, where a clean algebra for > rewriting was not a design goal. Ansi C have tried to do some poor > fixup engineering on this. We are not amused. Where does this "rewriting algebra" come from, and what makes it poorly defined? I don't think the ANSI C standard includes a rewriting algebra; I presume it's a function of the language's semantics. So, the rewriting algebra could conceivably be as well defined as the language itself. A poorly defined rewriting algebra could be the result of a poorly defined language, or because the designer of the rewriting algebra did a poor job. > No, because even assuming that the underlying algebra is clean, > rewriting combinations of primitives is a far more concpetually > difficult task than mapping each of them to the real machine. In > particular it requires careful analysis. Such analysis tends to > result in more bugs in the compiler, and the compiler is the > bottleneck of an entire system's reliability. Many of the things we do with computers were once thought to be too complex to program, but that doesn't mean we shouldn't have tried to implement them. AI is an example of a computer task that we still consider conceptually difficult, but we keep plodding on trying to solve it. Therefore, my interpretation of the above is that you think aggressive optimization should still be considered a research project, not something that should be expected in commercial products. > No, because even if two combinations of primitives can actually be > shown in the given context to be equivaltn according to the > language's algebra, it ought to be assumed that the programmer > intended to write the particular one he chose to write for some > good reason. Every good treatise on writing efficient programs generally starts with something to the effect that one should write programs clearly first, and then go back and optimize the bottlenecks. So, the programmer is likely to have chosen a combination of primitives that is closer to his algorithm than to the real machine. The programmer may not be aware of the physical machine primitives that the language primitives map onto, or may believe he knows but be mistaken (a particular compiler's mapping between virtual and physical machines is not generally documented, except perhaps in the area of data formats), so his "good reason" may actually be a "bad reason". Also, many programs are targeted for multiple (or unspecified) physical machines and/or compilers, so the programmer may not be able to choose language primitives that are good for all the target environments. > No, because the speed advantages are not that dramatic (or > nonexistent), *if* the author is a clever programmer that knows > which of the equivalent formaulations will be optimal on the given > machine. See above -- the program may not be targetted to a specific "given machine". >Are there alternatives? > > Yes, rewriting the combinations of primitives not in the code > generator, but at the source code level, either manually or > automatically. Are you saying that this should be done in a module that is independent of the target machine and compiler? I was under the impression that most complex optimization was already beind done using source-source translation, although sometimes it is necessary for the target source to include nonstandard, implementation-dependent primitives that map more directly onto the target hardware. For instance, strength reduction and common subexpression elimination translate the source into source with an additional temporary variable. Also, I don't think it matters what the destination of the rewrite is. I think the hard part is recognizing large combinations of primitives and reasoning about them. Transformation bugs usually result from unexpected interations between seemingly unrelated portions of the code (e.g. due to use of global variables or aliasing). This is one reason why much of the program transformation research has been done with functional languages: the absence of side effects makes semantic analysis much easier. > Yes, using a language whose primitives are at the abstraction level > desired, i.e. the primitives offered by the language are those > needed for the application. In this case the optimization is not > aggressive, because the generator is only doing optimal > implementation of hopefully well defined language primitives, not > supposedly optimal rewriting of pieces of a user program. Unfortunately, the computer industry doesn't abound with application-specific languages. And even if there were, finding programmers proficient in many of them would be hard. [Enter biased mode] It's hard enough finding people proficient in the better general-purpose languages; most schools just churn out Pascal and C programmers. [Exit biased mode] >Are the alternatives cost effective? > > Yes, because for *many* even if not all codes the time critical > portion is small enough that careful automatic rewriting is > feasible. How does the automatic rewriter know which parts are time critical? In many cases, program optimization is an incremental process: you find the bottleneck, rewrite it efficiently, find the new bottleneck, rewrite it, etc., until the overall efficiency meets the goals. > Yes, because automatic rewriting can be to a higher level language, > which can then be more easily and non aggressively optimized, and > automatic rewriting gives long lasting benefits in portability and > confidence in the program. Ah, now I see that you're talking about *replacing* the source file with this automatically rewritten source file. I thought you were talking about source-level transformations internal to the compiler (usually operating on the parse-tree, not the actual source text). Since the purpose of the transformation is to produce code that will map better by a specific compiler onto a particular hardware, I don't see the portability benefit of saving the transformed code. > Yes, because using a language whose primitives are at the level of > the problem is known to reduce costs and improve quality via reuse > of the language's implementation or libraries, where otherwise the > implementation of thsoe primitives would have to be rewritten and > re-aggressively optimized for every application. This is true if you can find such a language and programmers who know it. If the application is not very common then you will probably have to spend quite a bit to purchase or develop the language implementation. Subroutine libraries for general purpose languages can often do nearly as well and are much easier to come by. >They are legal but dangerous and not cost effective, according to the >above discussion. You made the statement that these optimizations don't buy much, but didn't back it up with any evidence. >chased> The compiler-writer's claim is "I can prove that these >chased> optimizations conform to the standard"; > >STOP THE PRESSES! Compiler writer proves that he can do program >rewriting in a bug free way! Program synthethizers must be round the >corner! Major software companies stocks collapse! Very little can be *proven* about any program; there are just too many variables and dependencies. A compiler writer who makes such a statement is merely asserting that he's as confident about his optimizations as he is about the rest of his compiler. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar