Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!hp4nl!charon!dik
From: dik@cwi.nl (Dik T. Winter)
Newsgroups: comp.sys.sgi
Subject: Re: Where's the SPARK in my SPARC????
Message-ID: <3038@charon.cwi.nl>
Date: 27 Feb 91 08:35:49 GMT
References: <1991Feb21.120049.5626@jarvis.csri.toronto.edu> <1991Feb26.202910.27944@pixar.uucp>
Sender: news@cwi.nl
Organization: CWI, Amsterdam
Lines: 47

In article <1991Feb26.202910.27944@pixar.uucp> mccoy@pixar.uucp (Dan McCoy) writes:
 > Aside from the register windows that others mentioned, another
 > place that SPARC often loses ground versus MIPS processors (like SGI)
 > is integer multiplies.  Unless they snuck them into the SparcStation 2
 > when I wasn't looking, SPARC still does multiply in software whereas
 > MIPS processors have hardware for that.
 > 
It is true that MIPS processors do the integer multiply in hardware while
the SPARC does it in software (using multiply step instructions).  However,
when you look at number of cycles to do a multiply there is in general not
much difference.  The MIPS mult instruction takes quite some time to
complete (11 cycles) and you have to pick up the result.  Of course you
can do other things during the multiply, but you will find that in general
you do not have enough instructions to fill all those cycles.  See the
examples in Kanes book where you find 6 to 11 cycle interlocks after a
multiply.  On the other hand, Sun software is clever enough to use a
short multiply sequence if one of the operands is small, bringing down
the time needed from 34 cycles for a full multiply to 18 or 14 cycles.
So no much difference there.

Also register windows do not matter very much.  In practice there is no
tremendous speed-down from register windows.  You will only get problems
if your program repeatedly calls a (large) nested sequence of small
routines.  And then you will only see that system time goes up because
of register window overflow/underflow traps.  On the other hand, there
was no need to store a bunch of local variables explicitly on the stack.
I think that the net result is that there is not much difference.

The main point is, as I have mentioned before and now proven also, is the
distinction between honoring prototypes (MIPS C compiler) and not
honoring them (Sun C compiler).  In the latter case specifying that all
floating point operations must be done in single precision is useless for
routines that take floating point parameters (because of the implicit
promotion to double that is mandated by K&R).  I did pick up the original
source and changed it so as to allow Sun's -fsingle2 flag (pass all
floating point parameters as float, not as double).  Changes were needed
because all floating point parameters to library routines must explicitly
be cast to double.  I did timings on an SLC and got the following results
(where the flags given are in addition to a bunch of flags common to all
tests):
cc			2m33.16s
cc -fsingle		1m36.00s
cc -fsingle -fsingle2	0m20.46s
Some improvement I would say.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl