Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!exodus!rbbb.Eng.Sun.COM!chased
From: chased@rbbb.Eng.Sun.COM (David Chase)
Newsgroups: comp.arch
Subject: What the compiler won't do you for you
Message-ID: <8658@exodus.Eng.Sun.COM>
Date: 26 Feb 91 19:40:01 GMT
References: <10244@dog.ee.lbl.gov> <1991Feb25.203629.5059@linus.mitre.org> <10278@dog.ee.lbl.gov> <22605:Feb2608:04:5391@kramden.acf.nyu.edu>
Sender: news@exodus.Eng.Sun.COM
Organization: Sun Microsystems, Mt. View, Ca.
Lines: 74

(Even though I'm replying to Dan, I'll be polite.  Truth is stranger
than fiction, eh?)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Observation 2: Whoever writes the optimizer for FUBAR---let's call this
>guy Natas---*could* make every single FUBAR instruction available from
>Z. All he has to do is make sure that there's *some* language construct,
>perhaps ridiculously convoluted, that will compile to each instruction.

>Observation 3: Natas rarely does this. I have yet to see any compiler
>understand any portable expression of Hamming weights, even though this
>wouldn't be very difficult. Even in the occasional case where the
>compiler writer shows some evidence of having used the assembly language
>he's compiling for, he rarely tells programmers what the optimizer can
>and can't do.

>Let's assume that Natas is not such a devil, and manages not only to
>give his optimizer some way of using FUBAR's CXi operation and some way
>of using FUBAR's DREM operation, but also to *document* (gasp) the
>corresponding expressions in Z. Now Joe can write portable Z code that
>will run fast on FUBAR, taking advantage of the chip's instructions. All
>he has to do is follow Natas's instructions.

There's one problem with this: finite resources (if you or Herman
Rubin wants to pony up a dump-truck full of cash, maybe we can talk,
but I'll bet your resources are finite, too).  I think you'll agree
that no compiler writer should spend time optimizing for the
machine-specific case until most of the optimization has been done for
the portable (standard-conforming) case.  So, after we've taken care
of reduction in strength, global value numbering, constant
propagation, redundancy elimination, loop invariant code motion,
instruction selection, register allocation, scheduling, loop
unrolling, loop straightening, peephole improvements, software
pipelining, linear function test replacement, loop fusion,
stripmining, blocking, dead code elimination, tail call elimination,
leaf routine optimization -- oops, better algorithms appeared, so we
have to reimplement a couple of those -- oops, new machine, time to
fool around with the scheduling and instruction selection some more --
oops, time to do some fancy code placement to help out the cache --
oops, time to do interprocedural optimization.  I think you get the
picture.  Always, optimization efforts have to be directed towards
those things that will make the largest number of present and future
customers happy.  When all the work is done for C, Fortran, Pascal,
Modula, and C++, is it better to expose non-portable machine-specific
optimizations to the programmer, or should we look at extending the
optimizer to be useful to Lisp, Ada, Cobol, RPG-III, Prolog, and
Jovial?  Maybe we'd make people happier if the optimizer ran twice as
fast, or in half the memory.  Maybe we should make use of some of the
dataflow algorithms to provide debugging feedback to the user
("there's no exit from this loop; perhaps it won't terminate?")

Another difficulty with the scheme that you describe is that you
really don't want people to be writing their code in strange little
idioms because that will engage some magic widget in the optimizer.
It's portable, but weird.  That doesn't help readability or
debuggability, and it may not be portable into the future.  If someone
rewrites the optimizer, the last thing they want to do is support
weird little hacks like that.  If you must (and some people must),
then use something based on subroutine calls.  That has the advantage
that (1) if no machine support exists, it is easy to plug in portable
code and (2) machine support often exists for assembly-language
implementations (for instance, say "man inline" on a Sun).

Of course, there are some things that "ought" to be recognized because
they are portable, but it still isn't clear that they are needed, or
worth the cost.  For instance, on the Sparc we *could* spill register
%i7 (return PC) and use that register for other purposes in the main
body of a subroutine.  Fine, except that we'd break all the debuggers,
including adb.  That adds big costs to the final debugging that goes
on before a product (ours, or some software vendor's) product is
shipped.  Probably not worth it.

David Chase
Sun


Brought to you by Super Global Mega Corp .com