Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!elroy.jpl.nasa.gov!decwrl!pa.dec.com!decprl!decprl!shand From: shand@prl.dec.com (Mark Shand) Newsgroups: comp.arch Subject: Re: integer multiplies on a Sparc Keywords: integer vs. floating point Message-ID: <1991Feb13.134057.2237@prl.dec.com> Date: 13 Feb 91 13:40:57 GMT References: <16864@ogicse.ogi.edu> Sender: news@prl.dec.com (USENET News System) Reply-To: shand@prl.dec.com (Mark Shand) Distribution: na Organization: Digital Equipment Corporation - Paris Research Laboratory Lines: 43 Integer multiply on SPARC is indeed poor. I recently added an assembler kernel for SPARC to our bignum package and found the fastest way to do multiprecision integer multiply was through the FPU. The primitive I use is 32bitx16bit->48bit which can be computed exactly in double precision. I've only timed it on a SPARCstation 1 which has a rather slow 9 cycle DP mult. The overall performance for multiprecision integer multiplies is about 4 times less than a MIPS R2000 which has a 12-16 (depending how you count) cycle 32x32->64 integer mult, but is still faster than any other way of doing full-word integer mult on an early SPARC. (our bignum package is available by mail from librarian@prl.dec.com, we will be announcing an FTP server soon) Even on a more balanced machine like the MIPS R2000,R3000 floating mult, although more resource intensive than integer mult, is a higher priority operation and, through the devotion of more hardware, takes fewer cycles. Moral: tradeoffs between integer vs float are subtle, just because an operation CAN be implemented more efficiently doesn't mean it HAS BEEN. Of course next year's CPU designers will benchmark your neural net code that you've finally decided to cast in floats even though ints would have served you equally well, and those designers will deprecate integer multiply even further. Questions: Does anyone know which SPARC implementations include integer multiply support beyond the multiply step instruction? What is the opcode? What happens if an early SPARC hits such an opcode? Have these SPARC implementations found their way into any product machines yet? Another thing that bugged me about multiply step was that it doesn't seem to give any way to get the high order part of the result. MIPS on the contrary gives you lo and hi result registers. This is essential in multiprecision work. Am I missing something in multiply step? Do the newer instruction help here? Mark Shand.