Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!decwrl!pyramid!prls!mips!cprice
From: cprice@mips.COM (Charlie Price)
Newsgroups: comp.arch
Subject: Re: RISC v. CISC (was The NeXT problem)
Summary: R2000/3000 instr timing
Message-ID: <7472@winchester.mips.COM>
Date: 2 Nov 88 01:21:24 GMT
References: <156@gloom.UUCP< <28200218@urbsdc> <4759@pdn.UUCP>
Reply-To: cprice@mips.UUCP (Charlie Price)
Organization: MIPS Computer Systems, Sunnyvale, CA   94086-3650
Lines: 66

In article <4759@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes:
>
>The 88k does a 32-bit integer multiply in 4 cycles (r3000 takes 13
>cycles, I believe).  A 32-bit integer divide takes the 88k 39 cycles
>(r3000 takes 36 cycles, I believe).  Of course, if either of the
>division operands is negative (signed division opcode), the 88k has to
>trap to a software routine to finish the division.  In other words,
>not all RISCs are wimps just because they don't have "complex
>instructions". 
>Alan Lovejoy; alan@pdn; 813-530-8241; Paradyne Corporation: Largo, Florida.

The MIPS R2000 and R3000 have integer multiply/divide instructions,
but they are unlike the other main CPU instructions.
The source operands are in general purpose registers and
the result (64-bit product or 32-bit quotient and 32-bit remainder)
is written to a special pair of registers named HI and LOW.
There are instructions (MFHI MFLO) to move from HI and LOW to a
general register.
So why do it this (seemingly odd) way?

From the architecture spec:

  Multiply and divide operations are performed by a separate,
  autonomous execution unit.  After a multiply or divide operation
  is started, execution of other instructions may continue in parallel.
  The multiply/divide unit continues to operate during cache miss and
  other delaying cycles in which no instructions are executed.

  The number of cycles required for multiply/divide operations is
  implementation-dependent.  The MFHI and MFLO instructions are
  interlocked so that any attempt to read them before operations
  have completed will cause execution of instructions to be delayed
  until the operations finishes.

  The table below gives the number of cycles required between a
  MULT, MULTU, DIV or DIVU operation and a subsequent MFHI or MFLO
  operation, in order that no interlock or stall occurs.

		MULT	MULTU	DIV	DIVU
  R2000		12	12	33	33
  R3000		12	12	33	33

Clearly in order to do something useful you
need to pick up at least one 32-bit portion of the result,
so in the best case you get a 13 cycle multiply and a
34 cycle divide.  If a stall occurs, it may complicate
restarting the pipeline and add an additional cycle.
By the way, it is worth noting that the 88000 4-cycle multiply
mentioned above only generates a 32-bit result...

The "why" of the MIPS architecture is that integer multiply/divide
is a sort-of coprocessor.
When a full multiply is necessary, it can be done faster than
with software-only and it may be possible to get other useful
work done while waiting for the result.
All of the work determining that this was a worthwhile feature
to add to the architecture was done long before I came to MIPS so
I can't comment on the basis for this decisison (perhaps mash
will comment on that).

In practice, many things that seem to require multiply instructions
get turned into some sequence of inline shifts and adds.
Obviously the compiler makes some sort of decision about which
is "better" to use.
-- 
Charlie Price    cprice@mips.com        (408) 720-1700
MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA   94086