Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker!spdcc!ima!cfisun!stardent!wright
From: wright@stardent.Stardent.COM (David Wright @stardent)
Newsgroups: comp.arch
Subject: Re: Bloat costs
Message-ID: <1990Jun27.011149.2406@Stardent.COM>
Date: 27 Jun 90 01:11:49 GMT
Distribution: na
Organization: Stardent Computer, Newton MA
Lines: 66


In article <62777@sgi.sgi.com>, karsh@trifolium.esd.sgi.com (Bruce Karsh)
writes:

>> -- me
> - Karsh

>>But I must take issue with statements like:
>>In order to make the FFT be
>>really efficient, it's necessary to pretabulate trig tables and powers of 2, 
>>use tricky methods to perform the multiplications, arrange the code so that
>>the integer and floating point calculations are overlapped... etc.  When you
>>do all that, you wind up with a pretty fast, but very unclear program.

>I've never seen an optimizing compiler that will do any of these optimizations.
>It's a widely held belief that compiler optimizations can get the most out
>of an algorithm, but it's hardly ever true.  There are optimizations that
>compilers can do and optimizations that they can't do.  These are in the
>later set.

Whoops.  My fault for being unclear.  I didn't mean to imply that compilers
were smart enough to figure out how to pretabulate the trig tables.  Powers
of two, maybe.  However, you're going to have to be more specific about
what you mean by overlapped adds and multiplies that the compiler isn't
smart enough to reorganize.  Seems to me that with the advent of superscalar
chips, for example, the compilers damn well better be smart enough to do
this.  Or are we not talking about the same thing?

>Is there a C compiler which will recognize a complex multiplication and
>generate code which needs 3 multiplications instead of four?  Perhaps they
>ought to exist, but the don't.  I don't believe that compiler techniqes yet
>exist to make such a compiler.

But when you say "complex multiplication", exactly what do you mean?  An
example would be illuminating here.

>Assembly language is just one of the tools for
>solving these problems.  There seems to be an attitude that these
>problems aren't important, or are too "messy" to be dealt with.  Sometimes
>the result of this attitude is uncompetively slow systems.

Could be.  But I'm not trying to argue against the use of assembler where
it's needed; the question is "where is it needed?"  As an OS programmer,
I'm apt to be in situations where assembler is the only thing I can use,
and sometimes, as in your example of bcopy, it's just plain faster to do
it in assembler.  But obviously we're only going to do this where it 
makes a difference that matters.

>Quicksort is nowhere near optimal.  It's worst-case behavior is O(n*n).

Golly gee, no kidding.  But its expected behavior (unless it's really
stupidly coded) is O(n*log(n)).  I'm not arguing that the programmer doesn't
need to understand the algorithm, nor am I arguing that for the utterly
fastest behavior, the programmer can avoid looking at what the compiler
produces (I do this myself).  However, sometimes the result of this is
a note to the compiler developers that results in faster code.  And
sometimes it's the pleasant discovery that the compiler did just what I
wanted it to do.

And my compiler can beat up your compiler :-)

  -- David Wright, not officially representing Stardent Computer Inc
     wright@stardent.com  or  uunet!stardent!wright

After all, this is still the land of opportunity.  If you know where
to look.  -- Jack Douglas