Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!hp4nl!charon!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.sys.sgi Subject: Re: Where's the SPARK in my SPARC???? Message-ID: <3038@charon.cwi.nl> Date: 27 Feb 91 08:35:49 GMT References: <1991Feb21.120049.5626@jarvis.csri.toronto.edu> <1991Feb26.202910.27944@pixar.uucp> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 47 In article <1991Feb26.202910.27944@pixar.uucp> mccoy@pixar.uucp (Dan McCoy) writes: > Aside from the register windows that others mentioned, another > place that SPARC often loses ground versus MIPS processors (like SGI) > is integer multiplies. Unless they snuck them into the SparcStation 2 > when I wasn't looking, SPARC still does multiply in software whereas > MIPS processors have hardware for that. > It is true that MIPS processors do the integer multiply in hardware while the SPARC does it in software (using multiply step instructions). However, when you look at number of cycles to do a multiply there is in general not much difference. The MIPS mult instruction takes quite some time to complete (11 cycles) and you have to pick up the result. Of course you can do other things during the multiply, but you will find that in general you do not have enough instructions to fill all those cycles. See the examples in Kanes book where you find 6 to 11 cycle interlocks after a multiply. On the other hand, Sun software is clever enough to use a short multiply sequence if one of the operands is small, bringing down the time needed from 34 cycles for a full multiply to 18 or 14 cycles. So no much difference there. Also register windows do not matter very much. In practice there is no tremendous speed-down from register windows. You will only get problems if your program repeatedly calls a (large) nested sequence of small routines. And then you will only see that system time goes up because of register window overflow/underflow traps. On the other hand, there was no need to store a bunch of local variables explicitly on the stack. I think that the net result is that there is not much difference. The main point is, as I have mentioned before and now proven also, is the distinction between honoring prototypes (MIPS C compiler) and not honoring them (Sun C compiler). In the latter case specifying that all floating point operations must be done in single precision is useless for routines that take floating point parameters (because of the implicit promotion to double that is mandated by K&R). I did pick up the original source and changed it so as to allow Sun's -fsingle2 flag (pass all floating point parameters as float, not as double). Changes were needed because all floating point parameters to library routines must explicitly be cast to double. I did timings on an SLC and got the following results (where the flags given are in addition to a bunch of flags common to all tests): cc 2m33.16s cc -fsingle 1m36.00s cc -fsingle -fsingle2 0m20.46s Some improvement I would say. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl