Xref: utzoo sci.math:9103 comp.arch:12910 comp.lang.c:24770 comp.sources.wanted:9961 Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!snorkelwacker!spdcc!merk!alliant!linus!bs From: bs@linus.UUCP (Robert D. Silverman) Newsgroups: sci.math,comp.arch,comp.lang.c,comp.sources.wanted Subject: Re: Integer Multiply/Divide on Sparc Message-ID: <85138@linus.UUCP> Date: 29 Dec 89 13:18:05 GMT References: <84768@linus.UUCP> <15418@vlsisj.VLSI.COM> Reply-To: bs@gauss.UUCP (Robert D. Silverman) Organization: The MITRE Corporation, Bedford MA Lines: 65 In article <15418@vlsisj.VLSI.COM> davidc@vlsisj.UUCP (David Chapman) writes: :In article <84768@linus.UUCP> bs@linus.mitre.org (Robert D. Silverman) writes: :>Does any have, of know of software for the SPARC [SUN-4] that will :>perform the following: :> :> [standard multiply and divide] :> :>The SPARC is brain dead [as were its designers] when it comes to doing :>integer arithmetic. It can't multiply and it can't divide. : :There should be instructions on the order of "multiply step" and "divide :step", each of which will do one of the 32 adds/subtracts and then shift. There is a multiply step instruction. There is no such support for division. It can take 200+ cycles to do a division on the SPARC [worst case]. A 32 x 32 bit unsigned multiply takes 45-47 cycles. Programs that have a significant number of multiplies and divides can run SLOWER on a SPARC than on a SUN-3. [I have such!] ONLY because of the slow multiply/divides. :I'm not particularly fond of the SPARC architecture (don't like register :windows), but this is a theoretical viewpoint and is not based on any :direct exposure to assembly-language programming for it (translation: :sorry, I can't give you any more help). : :Neither SPARC nor its designers were brain-dead when it was built. It's just I didn't say they were. I said they were with respect to arithmetic. I stand by that assertion. Most programs may not need multiply/divide in hardware. However, for those that do require it, not having it is a real KILLER of algorithms. :that it is difficult to get multiplication and division (especially the :latter) to run in 1 or 2 clock cycles. All instructions are supposed to I know of quite a few DSP chips that do multiplies in 1 cycles. Divides take a little longer [but not much; Ernie Brickell of SANDIA invented a hardware divide that works much faster than standard conditional-shift/ subtract]. :execute in the ALU in 1 cycle; if the multiply and divide instructions take :more time then the front of the processor pipeline has to be able to stall :and this added complexity will slow down the entire processor. : :Thus they provide you with the tools to do your own multiply and divide. See above. They are too slow. :One of the benefits is that a compiler can optimize small multiplies and :divides to make them execute quicker (i.e. multiply by 10 takes 4 steps That's fine for multiply-by-constant. Most programs that NEED multiply/divide are multiplying variables. :P.S. Don't write a loop on the order of "MULSTEP, DEC, BNZ" or it will be : incredibly slow. Unroll the loop 4 or 8 times (MULSTEP, MULSTEP, : MULSTEP, MULSTEP, SUB 4, BNZ). Branches are expensive. Agreed. In fact my 32 x 32 bit multiply consists of 32 calls to multstep and no looping at all. It is still slow. -- Bob Silverman #include Internet: bs@linus.mitre.org; UUCP: {decvax,philabs}!linus!bs Mitre Corporation, Bedford, MA 01730