Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!pasteur!ic!rudell From: rudell@ic.uucp (Richard Rudell) Newsgroups: comp.arch Subject: Re: SPARC and the Slow Multiply Instruction Message-ID: <1157@pasteur.Berkeley.Edu> Date: 2 Mar 88 06:12:34 GMT Sender: news@pasteur.Berkeley.Edu Reply-To: rudell@ic.UUCP (Richard Rudell) Organization: U.C. Berkeley EECS CAD Group Lines: 108 References: Keywords: I became interested in the Sun/4 multiple/divide timings after reading Appendix E of the Sun/4 Architecture Manual. This Appendix describes in detail the assmebly language routines in SPARC for doing 32-bit multiply and divide. SPARC has a multiply-step instruction for unsigned multiply; it takes 32 instructions plus a minimum of another 8 (even if both operands are positive) to check for negative arguments in a signed multiply. They also place another 5 instructions at the head of the routine to check for a 'short' case (i.e., if argument %o0 is less than 12 bits long.) Hence, 12-bit multiplies go faster than the general case. Note that it does NOT check both operands to see if one is larger/smaller than the other before this test; hence, only if the compiler knows that one operand is small (e.g., structure size constants ?) does this speed it up. Function call overhead to reach the multiply routine is negliable because the Sun compiler avoids the normal subroutine linkage on 'leaf' routines. The divide algorithm is an iterative algorithm and the complexity in instruction count is difficult to predict. It is interesting that Sun decided not to provide a divide-step instruction as well. Anyways, here are some 'experimental' timing results. Great care/pain was taken to get around the Sun/4 optimizing compiler attempting to move things out of loops, and dropping 'dead' code. The goal was to measure raw instruction times: Multiply: 123456*789012 MicrovaxII 5.3 us Sun 3/180 2.5 us Sun 4/280 3.0 us Divide: 1234567890 / 789012 Microvax II 7.7 us Sun 3/180 7.2 us Sun 4/280 9.3 us Multiply and divide are slow. Lets hope they are not used frequently. On the other hand, here are the instruction times for executing : f(f(0)+f(0)) given f(a) { return a; } (i.e., 3 func calls with 1 argument plus a single addition.) MicrovaxII 55.9 us Sun 3/180 10.3 us Sun 4/280 1.4 us Note that this is an ideal case for the Sun 4 because the register windows NEVER overflow. Therefore, we proclaim the Sun/4 a 1.7 VAX-MIP multiplier, a .8 VAX-MIP divider, and a 40 VAX-MIP function-caller. The microcycle time of the uVAX is 200 ns. This compares to the Sun/4 'microcycle' time of 62.5 ns (except the microvax never misses in its 'instruction cache'). This means that the uVAX takes only 26 cycles for the multiply given above; the Sun/4 takes 48 cycles. For divide, the counts are 38 cycles for the uVax and 149 cycles for the Sun/4. Oh, by the way, the Sun 4/280 is a 8.6 VAX-MIP machine for the program Espresso (a strictly integer bit-cruncher). Apparently multiply is not very important to Espresso. *** EDITORIAL MODE ON *** I like RISC. The VAX Architecture is a big lose if you want to go fast. VAX is a pig at function call. But this does not mean that 'simple is better' and 'RISC multiply is no slower than executing out of microcode.' When you consider operand setup, etc. microcode or special hardware is a win for multiply and divide (even if the algorithm is still 1- or 2-bit at a time sequential). This may not be a problem for many typical Unix applications, but there are some applications (16-bit DSP simulation, for example) which may run very slowly on a Sun/4 compared to a Sun/3 or uVax. I agree heartily with the MIPS Co. (and AMD 29000) decisions to put multiply/divide instructions in the architecture. This allows different models/versions to implement them differently without losing binary compatability; current models can either trap to the equivalent multiply-step loops in software (AMD 29000), or use special 'hard-wired microcode' (MIPS R2000). Why did SPARC not include a 2-bit at a time multiply instruction, or a special multiply instruction in the architecture to allow for future growth ? Why is there no divide instruction or divide-step instruction ? Is this important to anyone ? *** EDITORIAL MODE OFF *** --------------------------------------------------------------------------- Richard Rudell Graduate Student rudell@ic.berkeley.edu (ARPA) 205 Cory Hall ...!ucbvax!ic!rudell (UUCP) University of California (415) 642-3626 Berkeley, CA 94720