Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uwm.edu!dogie.macc.wisc.edu!decwrl!sgi!karsh@trifolium.esd.sgi.com
From: karsh@trifolium.esd.sgi.com (Bruce Karsh)
Newsgroups: comp.arch
Subject: Re: Bloat costs
Message-ID: <63034@sgi.sgi.com>
Date: 28 Jun 90 08:55:32 GMT
References: <1990Jun27.011149.2406@Stardent.COM>
Sender: news@sgi.sgi.com
Reply-To: karsh@trifolium.sgi.com (Bruce Karsh)
Distribution: na
Organization: Silicon Graphics, Inc., Mountain View, CA
Lines: 68

In article <1990Jun27.011149.2406@Stardent.COM> wright@stardent.Stardent.COM (David Wright @stardent) writes:
>Whoops.  My fault for being unclear.  I didn't mean to imply that compilers
>were smart enough to figure out how to pretabulate the trig tables.  Powers
>of two, maybe.

Maybe they could, but right now they don't.

>However, you're going to have to be more specific about
>what you mean by overlapped adds and multiplies that the compiler isn't
>smart enough to reorganize.

The MIPS processor can overlap a couple of floating point calculations with
an integer operation.  The MIPS optimizing compiler does not alway generate
optimal overlap for these cases.  Perhaps someday someone will generate
a compiler which does this optimally, but right now the MIPS compiler doesn't.

>  Seems to me that with the advent of superscalar
>chips, for example, the compilers damn well better be smart enough to do
>this.  Or are we not talking about the same thing?

We'll see how good a job the superscalar compilers can do.

>But when you say "complex multiplication", exactly what do you mean?  An
>example would be illuminating here.

I mean a multiplication of two complex numbers.  For example:

	(e + if) = (a + jb) * (c + jd)

can be calculated as:

	e = a*c - b*d;
	f = a*d + b*c;

This is the clear, obvious way to code this calculation.  It has four
multiplies and two summations.

It can also be coded as:

	u = (a-b)*d;
	e = u + a*(c-d);
	f = u + b*(c+d);

Which has only 3 multiplies.  (But 5 summations).

Some architectures (e.g. SPARC) take much longer to multiply than to add.

I know of no C compiler which will recognize a complex multiplication and
rewrite it in the faster form.  (It's not even clear that you'd want it to).
If you are interested in making these complex multiplies occur very rapidly,
then the later method should be used since it is faster.  It's not as clear
though.

>Golly gee, no kidding.  But its expected behavior (unless it's really
>stupidly coded) is O(n*log(n)).  I'm not arguing that the programmer doesn't
>need to understand the algorithm, nor am I arguing that for the utterly
>fastest behavior, the programmer can avoid looking at what the compiler
>produces (I do this myself).  However, sometimes the result of this is
>a note to the compiler developers that results in faster code.  And
>sometimes it's the pleasant discovery that the compiler did just what I
>wanted it to do.

I fully agree.  But the original posting stated that it was enough to just
right clear code because the optimizer will make the most of it.  In general,
it wont.  Maybe someday.

			Bruce Karsh
			karsh@sgi.com