Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!clements From: clements@cs.utexas.edu (Paul C. Clements) Newsgroups: comp.software-eng Subject: optimizing compiler technology ==> help for SE? Keywords: optimizing compilers, performance, performance-critical systems Message-ID: <963@lychee.cs.utexas.edu> Date: 9 Nov 89 19:40:32 GMT Organization: U. Texas CS Dept., Austin, Texas Lines: 175 About ten days ago, I posted a request for information that asked, roughly, "How much can we rely on optimizing compiler technology to get back the performance cost that usually comes with the use of sound software engineering techniques?" I promised to post a summary of responses. I got over two dozen answers. Over twenty responses were what I call "sociological" or "anecdotal". These (a) told me why we ought to use good techniques _anyway_; (b) told me how to coerce or fool "recalcitrant" decision-makers into using the techniques _anyway_; (c) told me why performance wasn't important; or (d) provided anecdotal evidence that the performance penalty associated with sound engineering techniques is a myth, or is real but not very big, or is real and is big. To these people, I say thank you for writing, and you raised good points (at least for your application area) but the problem you addressed is not the one I had in mind. I explicitly stated that performance was VERY important in my hypothetical system, and I was fascinated by the number of people who either ignored that, or assumed that it didn't really matter "in this day and age." In the application I have in mind: (a) every byte counts; (b) every millisecond counts; (c) performance requirements are well-defined, tight, non-negotiable, and hard to meet; (d) a program failing to meet its performance requirements is useless; and (e) getting new hardware is not a possible alternative. In this environment, people in charge will CHOOSE -- a conscious, well- informed, reasoned decision -- high performance over just about any other advantage you can name that derives from application of what we would call sound software engineering practices. They see no choice. For doubters, one example of this application domain is embedded tactical avionics systems for military aircraft. Well, of course many (most?) optimization problems are NP-complete, if not undecidable. However, there is a shocking (to me) lack of formality with regard to optimization work. I was hoping to find something like "to solve optimization problem P, technique T will handle all cases of the form F, and produce optimal code. For a case not of form F, T will produce code no worse than what was given, and maintain its correctness." No such luck. All the optimization literature I've seen is of the form "Here's a technique to solve this problem. It works like this. We ran it, and discovered that either (a) it produces better code than an optimizing compiler we were competing against; or (b) it produces at least as good code as a competitor, but does it a little faster." Short summaries of four intriguing "technical" responses that I received follow; you may inquire directly of the author for further information. Thanks for everyone's help. P. C. Clements University of Texas at Austin (130 lines follow) ============================================================================== >From soi!chip@harvard.harvard.edu Wed Nov 1 09:04:22 1989 My company is currently working on a DARPA-funded 5-year project that has as one of its main goals precisely your paragraph. The project is the joint design and prototyping of a wide-spectrum language and programming environment. One underlying idea is to design the environment and the language together. One of our design goals is to allow the programmer to use _very_ high level constructs at no loss of penalty... We pay _no_ overhead for procedure calls unless it is required by the program... We allow the programmer to specify the relative cost of time and space, and most of our optimization methods compute costs relative to these parameters before choosing what to do.... ============================================================================== From mauney@cscadm.ncsu.edu Fri Nov 3 15:52:05 1989 ...The additional problems posed by information-hiding techniques are likely to be a) Indirection The concrete representation of an abstract data type is likely to be based on a pointer... b) procedure calls Manipulating an object of abstract type requires a procedure call, which is expensive. But procedure calls can be in-lined, even across modules... c) need for inter-module analysis Inter-module analysis for optimization is theoretically the same as any other inter-procedural analysis... d) dynamic binding ...Lisp partisans claimed to have (this) under control, and many good languages don't have dynamic binding. e) Generic code With paramterized types, such as generic packages in Ada, we have a problem: generate one piece of code that is general enough to be used in all instantiations, or macro-expand distinct pieces of code for each instantiation? Here is the time-space tradeoff. But I don't see a difficult compilation process, only a question of how to let the programmer specify which way to go. It's really a matter of being willing to delay many of the optimizations until the whole program is assembled... ============================================================================= >From preston@rice.edu Tue Oct 31 15:09:21 1989 Your example, minimization of storage: Janet Fabri wrote a thesis on the subject. She later worked for IBM on the ECS compiler project... She used ideas that were similar to later work done with graph coloring for register allocation. However, "minimal (not just improved)" is certainly NP-Complete and probably undecidable. Does it matter? I don't expect an assembly programmer achieves minimal in any large program. Next subject, time/space efficiency: Compilers can't do much here. ...the real tradeoffs are made during the program (algorithm) design. (However, see Jack Schwartz's work on SETL, e.g. "Automatic data structure choice in a language of very high level", CACM, December 1975.) ...I can talk about what's up with optimizers, especially here at Rice. Classically, optimizers have worried about one procedure at a time. About 10 years ago, this area started to settle down, though there is still room for improvement. This attention to single procedures has lead a lot of people to worry about procedure call overhead (and to build 2000 line procedures). Of course, others began to study how to optimize whole programs. The family tree looks something like: Spillman Barth - Allen - Burke (others?) \ Banning - Cooper, Kennedy (Torczon, Callahan, Balasundarum, ...) A second area that's opening up is analysis of arrays. Classically, optimizers make pretty bad assumptions around arrays. Recently though, a lot of work has been done to understand patterns of array access in loops (dependence analysis). The initial motivation was vectorization, but people are beginning to apply the same techniques to scalar machines. So, we're beginning to understand arrays. What about records, linked structures, persistent objects, coroutines, first class functions, continuations, ...? ============================================================================== >From cline@cheetah.ece.clarkson.edu Fri Nov 3 14:13:29 1989 ...A more powerful boost to SE is: "Can we use good SE to produce what is actually *faster* code? The basic notion that I'll present is that "more high-level information means more chance for high-level optimization." I cite recent research which Doug Lea (dl@oswego.oswego.edu) and I (cline@sun.soe.clarkson.edu) are doing with annotating C++ code. We call the system "A++" for "Annotated C++". Benefits: Correctness. Formal verification can be employed to check if what you *said* (the code) is consistent with what you *meant* to say (the annotations). Thus *lots* more static (compile-time) analysis can be done on annotated code. Another benefit, although not immediately obvious, is Optimization. It turns out that the additional semantic information can be employed by the compiler to help generate better code. For example, many runtime consistency and exception test are redundant; formal verification can be used to *prove* them to be redundant, so they can be (automatically) removed. This reduces actual runtime testing to a minimum (or *nearly* minimal anyway) without changing the source code. Thus A++ ("Annotated C++") can be thought of as an "Exception Optimizer." ============================================================================== (end of msg)