Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!samsung!zaphod.mps.ohio-state.edu!mips!orac!cprice From: cprice@mips.COM (Charlie Price) Newsgroups: comp.arch Subject: Re: Integer Multiply/Divide on Sparc Message-ID: <34000@mips.mips.COM> Date: 29 Dec 89 00:42:03 GMT References: <84768@linus.UUCP> <8840004@hpfcso.HP.COM> <84983@linus.UUCP> Sender: news@mips.COM Reply-To: cprice@mips.COM (Charlie Price) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 74 > In article <84983@linus.UUCP> bs@linus.UUCP (Robert D. Silverman) writes: > >In article <8840004@hpfcso.HP.COM> dgr@hpfcso.HP.COM (Dave Roberts) writes: > > > >Huh? The MIPS-R3000 does integer multiplies in hardware in just a > >couple of cycles. The SPARC takes a minimum of 47 cycles using > >its so-called multiply-step function to multiply two integers. > >Division is even worse by almost an order of magnitude. The MIPS R2000, R3000, and R6000 do indeed have multiply and divide, but your idea of the time they take is not quite right. proc mult div (times in cycles to put result in special regs) R2000 12 35 R3000 12 35 I don't believe that the R6000 numbers are yet public, but I'm willing to say that they aren't better, in cycle counts, than the R3000. The operations work the following way: Multiply: Rs x Rt -> LO, HI multiplies two general purpose (32-bit) registers and produces a 64-bit result that is put into a couple of special-purpose registers named LO and HI. Divide: Rs / Rt -> LO, HI divides one 32-bit general register by another 32-bit general register producing a 32-bit quotient (in LO), and a 32-bit remainder (in HI). The LO and HI register pair are used only for the results of multiply and divide. Special instructions exist to move data from (and to) each of these registers. The multiply/divide unit operates as an autonomous unit. After a multiply/divide is issued, other non-multiply instructions continue to issue and execute. At some point the program wants the result of the operation, presumably after everything that can be done in parallel is done, and you issue a move-from-LO (or HI) instruction to move the result into a general register. If the operation is not yet done, the instruction interlocks and the processor is stalled till the result is available. The times above are the time needed to produce the result. Depending on what you mean by "how fast is X", you might want to include the instruction(s) necessary to get the result from LO (and/or HI) and put it (or them in a general register(s). This would add 1 cycle for one 32-bit result register or 2 cycles if you wanted the contents of both LO and HI. These operation times aren't especially fast. Being able to execute the op in parallel with "regular" instructions makes the effect of the operation length a little bit less, though the amount that helps depends a lot on what you can do in parallel with any multiply or divide. Finding 35 real instructions to occupy your time during a divide seems somewhat unlikely. One can make these operations faster by spending more hardware on them. The existing implementations have not chosen (or been able to) spend a lot of die area on these functions. For the current MIPS' processors it is generally worth figuring out whether you can get the multiply/divide job done with some short sequence of faster instructions rather than use the general instruction. As John Mashey has mentioned, multiply and divide were included in the MIPS instruction set because reasonably-fast general multiply and divide, though unnecessary for many applications in our architecture benchmark suite, are very important for some of them. -- Charlie Price cprice@mips.mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA 94086