Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!clements
From: clements@cs.utexas.edu (Paul C. Clements)
Newsgroups: comp.software-eng
Subject: optimizing compiler technology ==> help for SE?
Keywords: optimizing compilers, performance, performance-critical systems
Message-ID: <963@lychee.cs.utexas.edu>
Date: 9 Nov 89 19:40:32 GMT
Organization: U. Texas CS Dept., Austin, Texas
Lines: 175


About ten days ago, I posted a request for information that asked, roughly,
"How much can we rely on optimizing compiler technology to get back the
performance cost that usually comes with the use of sound software engineering
techniques?"  I promised to post a summary of responses.

I got over two dozen answers.  Over twenty responses were what I call
"sociological" or "anecdotal". These (a) told me why we ought to use good
techniques _anyway_; (b) told me how to coerce or fool "recalcitrant"
decision-makers into using the techniques _anyway_; (c) told me why performance
wasn't important; or (d) provided anecdotal evidence that the performance
penalty associated with sound engineering techniques is a myth, or is real
but not very big, or is real and is big.    

To these people, I say thank you for writing, and you raised good points
(at least for your application area) but the problem you addressed is not
the one I had in mind.   I explicitly stated that performance was VERY
important in my hypothetical system, and I was fascinated by the number of
people who either ignored that, or assumed that it didn't really matter
"in this day and age."  In the application I have in mind: (a) every byte
counts; (b) every millisecond counts; (c) performance requirements are
well-defined, tight, non-negotiable, and hard to meet; (d) a program failing
to meet its performance requirements is useless; and (e) getting new hardware
is not a possible alternative.  In this environment, people in charge will
CHOOSE -- a conscious, well- informed, reasoned decision -- high performance
over just about any other advantage you can name that derives from application
of what we would call sound software engineering practices.  They see no
choice.   For doubters, one example of this application domain is embedded
tactical avionics systems for military aircraft.

Well, of course many (most?) optimization problems are NP-complete, if not
undecidable.   However, there is a shocking (to me) lack of formality with
regard to optimization work.   I was hoping to find something like "to solve
optimization problem P, technique T will handle all cases of the form F,
and produce optimal code.   For a case not of form F, T will produce code
no worse than what was given, and maintain its correctness."

No such luck.   All the optimization literature I've seen is of the form
"Here's a technique to solve this problem.  It works like this.   We ran it,
and discovered that either (a) it produces better code than an optimizing
compiler we were competing against; or (b) it produces at least as good code
as a competitor, but does it a little faster."

Short summaries of four intriguing "technical" responses that I received
follow; you may inquire directly of the author for further information.

Thanks for everyone's help.

P. C. Clements
University of Texas at Austin                            (130 lines follow)

==============================================================================

>From soi!chip@harvard.harvard.edu Wed Nov  1 09:04:22 1989

My company is currently working on a DARPA-funded 5-year project that
has as one of its main goals precisely your paragraph.  The project is
the joint design and prototyping of a wide-spectrum language and
programming environment.  One underlying idea is to design
the environment and the language together.

One of our design goals is to allow the programmer to use _very_ high
level constructs at no loss of penalty... 

We pay _no_ overhead for procedure calls unless it is required by
the program...

We allow the programmer to specify the relative cost of time and
space, and most of our optimization methods compute costs relative to
these parameters before choosing what to do....

==============================================================================

From mauney@cscadm.ncsu.edu Fri Nov  3 15:52:05 1989

...The additional problems posed by information-hiding techniques are likely
to be
a) Indirection
     The concrete representation of an abstract data type is likely
     to be based on a pointer... 
b) procedure calls
     Manipulating an object of abstract type requires a procedure call,
     which is expensive.  But procedure calls can be in-lined, even
     across modules... 
c) need for inter-module analysis
     Inter-module analysis for optimization is theoretically the
     same as any other inter-procedural analysis...
d) dynamic binding
     ...Lisp partisans claimed to have (this) under
     control, and many good languages don't have dynamic binding.
e) Generic code
     With paramterized types, such as generic packages in Ada,
     we have a problem: generate one piece of code that is general
     enough to be used in all instantiations, or macro-expand
     distinct pieces of code for each instantiation?  Here is the
     time-space tradeoff.  But I don't see a difficult compilation
     process, only a question of how to let the programmer specify
     which way to go.

It's really a matter of being willing to delay many of the optimizations
until the whole program is assembled... 

=============================================================================

>From preston@rice.edu Tue Oct 31 15:09:21 1989

Your example, minimization of storage: Janet Fabri wrote a thesis on
the subject.  She later worked for IBM on the ECS compiler project...
She used ideas that were similar to later work done with graph coloring
for register allocation.

However, "minimal (not just improved)" is certainly NP-Complete
and probably undecidable.  Does it matter?  I don't expect an assembly
programmer achieves minimal in any large program.

Next subject, time/space efficiency:   Compilers can't do much here.
...the real tradeoffs are made during the program (algorithm) design.
(However, see Jack Schwartz's work on SETL,  e.g. "Automatic data structure
choice in a language of very high level", CACM, December 1975.)

...I can talk about what's up with optimizers, especially here at Rice.

Classically, optimizers have worried about one procedure at a time.
About 10 years ago, this area started to settle down, though there
is still room for improvement.  This attention to single procedures
has lead a lot of people to worry about procedure call overhead
(and to build 2000 line procedures).  Of course, others began to study
how to optimize whole programs.  The family tree looks something like:

Spillman

  Barth - Allen - Burke (others?)
        \
	 Banning - Cooper, Kennedy (Torczon, Callahan, Balasundarum, ...)

A second area that's opening up is analysis of arrays.
Classically, optimizers make pretty bad assumptions around arrays.
Recently though, a lot of work has been done to understand patterns
of array access in loops (dependence analysis).  The initial 
motivation was vectorization, but people are beginning to apply 
the same techniques to scalar machines.  So, we're beginning
to understand arrays.

What about records, linked structures, persistent objects,
coroutines, first class functions, continuations, ...?

==============================================================================

>From cline@cheetah.ece.clarkson.edu Fri Nov  3 14:13:29 1989

...A more powerful boost to SE is: "Can we use good SE to produce what is
actually *faster* code?

The basic notion that I'll present is that "more high-level information
means more chance for high-level optimization."  I cite recent research
which Doug Lea (dl@oswego.oswego.edu) and I (cline@sun.soe.clarkson.edu) are
doing with annotating C++ code.  We call the system "A++" for "Annotated C++".

Benefits:  Correctness. Formal verification can be employed to check if
what you *said* (the code) is consistent with what you *meant* to say
(the annotations).  Thus *lots* more static (compile-time) analysis can be
done on annotated code.

Another benefit, although not immediately obvious, is Optimization.  It
turns out that the additional semantic information can be employed by
the compiler to help generate better code.  For example, many runtime
consistency and exception test are redundant; formal verification can
be used to *prove* them to be redundant, so they can be (automatically)
removed.  This reduces actual runtime testing to a minimum (or *nearly*
minimal anyway) without changing the source code.  Thus A++ ("Annotated
C++") can be thought of as an "Exception Optimizer."

==============================================================================

(end of msg)