Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: int x int -> long for * (or is it 32x32->64) Keywords: arithmetic,arbitrary precision,benchmark,modular arithmetic Message-ID: <41497@mips.mips.COM> Date: 14 Sep 90 06:50:07 GMT References: <3984@bingvaxu.cc.binghamton.edu> <41425@mips.mips.COM> <353@kaos.MATH.UCLA.EDU> <2118@charon.cwi.nl> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 85 In article <2118@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: .... >Now some (my) opinions: ... >Mips would have done well to include a 64/32 bit divide in addition to the >32/32 bit divide. I do not think that would have added much complexity to >the processor. ... The opinion that 32x32->64, and 64/32->64 are essentially equivalent, and the latter would only cost a little complexity ..... is wrong, at least in the case of the R3000. (it may not be wrong for a microcoded machine, and it may or may not be wrong for RISCs in general; however, one my note that hardly any RISC gives you 32x32-> 64, but even fewer give you 64/32->64.... An explanation on why this is so was posted about 9 months ago by Craig Hansen and others. I also refer to Patterson&Hennessy, Appendix A on Computer Arithmetic. I won't repeat all of the details, but basically: 1) If divide follows the same, regular decoding structure as everything else, and if the dividend specifies the first of 2 registers (to get 64 bits), consider the consequences, assuming it was div dividend,divisor 1a) Either you need even/odd register pair (to get dividend and dividend|1 OR you need an extra adder (used by nothing else) to get dividend and dividend+1 1b) It becomes the ONLY instruction that needs to fetch 3 32-bit registers as inputs..... SO you need to add a 3rd read-port to the register file OR yuo must irregularize the pipeline in a way that no other instruction does to use 2 register fetch cycles... 2) Suppose you're willing to do that, or also pay the price of extra instructions to set up 64-bit dividend in the extra registers. 2b) For a bunch of reasons, you end up with extra latches & muxes (BAD). (Craig covered this) 3) But suppose you're willing to buy all of that. Now, it you look at Patterson and Hennessy, you will discover that if you do 32x32->64, and 64/32->32, as they say: "Note that the two block diagrams in Figure A.2 are very similar. ... By allowing these registers to shift bidirectionally, the same hardware can be shared between multiplication and division." "Figure A.2. The multiplier has an n-bit adder... the divider has an n+1 bit adder...." a) Integer multiply/divide takes up a nontrivial (not huge, but you can certainly see it on the layout) chunk of space. b) All of this stuff really just barely fit on the original R2000 in 2micron CMOS, and if you look at the layour again, you'll find that the right side of the chip is a regular stack of 32-bit wide datapaths, except the mul/div unit sticks out a few more bits. c) If you want 64/32->64, you discover that: the divisor wants to be expanded to 64 bits, i.e., n=64 now you want to have a 65-bit adder, which: may well be slower than a 33-bit adder may have exceeding awkward layout issues, either requiring something to be folded, or else be twice as wide as the whole rest of the stack. 4) SUMMARY: 32x32->64 with 32/32->32 -is regular in decode & register fetch -shares hardware well between mul and div -has few nasty, ugly layout implications 32x32->64 with 64/32->64 -is surprisingly irregular; in fact, it easily causes numerous special cases in the mainline paths of the machines where it would be the ONLY instruction to have some property - doesn't share as easily - is rife with ugly layout effects MORAL: -Apparently simple-looking additions can have surprisingly ugly and widespread effects -32x32->64 and 64/32->64 possess drastically different implementation effects in the context of a lean-pipelined, single-issue, 2-read-port, 32-bit RISC. They do NOT have the implementation similarity that one would expect. -The hardware folks could probably say this better, but these are at least some of the issues. -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086