Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!coherent!NeXT!chansen
From: chansen@NeXT.UUCP (Craig Hansen)
Newsgroups: comp.arch
Subject: Re: Double Width Integer Multiplication and Division
Summary: The Real Reason Mips has no double length divide
Message-ID: <4016@bauhaus.NeXT.UUCP>
Date: 7 Jul 89 17:41:10 GMT
References: <1046@aber-cs.UUCP> <1380@l.cc.purdue.edu> <13943@haddock.ima.isc.com>
Organization: NeXT Inc., Palo Alto
Lines: 60

There's been some discussion of why double-length multiply and divide
were or were not included on various machines. However, the real
reasons behind why RISC machines don't like double-length multiply and
divide hasn't been hit. I can speak most authoritatively on the Mips
RISC processor.

Most operations on a RISC processor can be expressed as a function of
the contents of two general registers, yielding a single result.
Double-length multiply (two sources; two results) and divide (three
sources; two results) both violate this generalization, which makes
them more expensive to implement. A Nice Clean Architecture would just
add register specifiers and read and write ports until there were
enough, but that's not way of RISC.

The Mips R2000 uses two special-purpose registers, each of which hold
half of the double-length result of a multiply, or the quotient and
remainder of a divide. These registers are very carefully handled in
the implementation to permit them to be written into immediately on
starting up the operation, in order to avoid additional bypass and
staging latches - they're written into two cycles earlier than the
general registers.

This is the reason why operations that modify these registers must not
occur within two cycles after an instruction that reads them: if they
are any closer, an interrupt or exception may require restarting the
instruction stream at the special-register read, even though the
register was previously modified by an instruction that modified the
register and was aborted.

For double-length divides, you'd need three words of source operand,
and the R2000 only have two general register read ports. Yes, there's
room in the instruction encoding for another register specifier, but
there's no hardware to get a third value from the register file.
Because of the way the special-purpose registers are not bypassed,
it would not be possible to use one them to hold the third value;
if an interrupt or exception required restarting a double-length
divide, that third value would be corrupted.

So that's why there's no double-length divide: although the divider
itself is intrinsically able to perform the operation, it would have
cost more latches and multiplexors to get the data into the unit.
Latches and multiplexors make up a surprising amount of the cost of a
RISC processor; one works very hard to minimize the number of them.

There are other reasons not to bother: a double-length divide can
overflow in difficult to detect ways; a single-length divide can also
overflow, but it's easy to detect divide by zero and divide of MININT
by -1 in software while the divide is running in parallel. Of course,
as has been mentioned before, there's no expression of this operation
in C, so a C compiler won't generate it.

Finally, it should be noted that an integer divide is actually more
complex (time x space-wise) than a floating-point divide: there are
fast redundant-representation techniques that only work for normalized
numbers. You'd probably find that a multiple-precision divide can be
implemented faster using floating-point arithmetic than fixed-point on
most RISC machines.

Regards,
Craig Hansen