Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!decwrl!pyramid!prls!mips!cprice From: cprice@mips.COM (Charlie Price) Newsgroups: comp.arch Subject: Re: RISC v. CISC (was The NeXT problem) Summary: R2000/3000 instr timing Message-ID: <7472@winchester.mips.COM> Date: 2 Nov 88 01:21:24 GMT References: <156@gloom.UUCP< <28200218@urbsdc> <4759@pdn.UUCP> Reply-To: cprice@mips.UUCP (Charlie Price) Organization: MIPS Computer Systems, Sunnyvale, CA 94086-3650 Lines: 66 In article <4759@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: > >The 88k does a 32-bit integer multiply in 4 cycles (r3000 takes 13 >cycles, I believe). A 32-bit integer divide takes the 88k 39 cycles >(r3000 takes 36 cycles, I believe). Of course, if either of the >division operands is negative (signed division opcode), the 88k has to >trap to a software routine to finish the division. In other words, >not all RISCs are wimps just because they don't have "complex >instructions". >Alan Lovejoy; alan@pdn; 813-530-8241; Paradyne Corporation: Largo, Florida. The MIPS R2000 and R3000 have integer multiply/divide instructions, but they are unlike the other main CPU instructions. The source operands are in general purpose registers and the result (64-bit product or 32-bit quotient and 32-bit remainder) is written to a special pair of registers named HI and LOW. There are instructions (MFHI MFLO) to move from HI and LOW to a general register. So why do it this (seemingly odd) way? From the architecture spec: Multiply and divide operations are performed by a separate, autonomous execution unit. After a multiply or divide operation is started, execution of other instructions may continue in parallel. The multiply/divide unit continues to operate during cache miss and other delaying cycles in which no instructions are executed. The number of cycles required for multiply/divide operations is implementation-dependent. The MFHI and MFLO instructions are interlocked so that any attempt to read them before operations have completed will cause execution of instructions to be delayed until the operations finishes. The table below gives the number of cycles required between a MULT, MULTU, DIV or DIVU operation and a subsequent MFHI or MFLO operation, in order that no interlock or stall occurs. MULT MULTU DIV DIVU R2000 12 12 33 33 R3000 12 12 33 33 Clearly in order to do something useful you need to pick up at least one 32-bit portion of the result, so in the best case you get a 13 cycle multiply and a 34 cycle divide. If a stall occurs, it may complicate restarting the pipeline and add an additional cycle. By the way, it is worth noting that the 88000 4-cycle multiply mentioned above only generates a 32-bit result... The "why" of the MIPS architecture is that integer multiply/divide is a sort-of coprocessor. When a full multiply is necessary, it can be done faster than with software-only and it may be possible to get other useful work done while waiting for the result. All of the work determining that this was a worthwhile feature to add to the architecture was done long before I came to MIPS so I can't comment on the basis for this decisison (perhaps mash will comment on that). In practice, many things that seem to require multiply instructions get turned into some sequence of inline shifts and adds. Obviously the compiler makes some sort of decision about which is "better" to use. -- Charlie Price cprice@mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA 94086