Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!elroy.jpl.nasa.gov!decwrl!pa.dec.com!decprl!decprl!shand
From: shand@prl.dec.com (Mark Shand)
Newsgroups: comp.arch
Subject: Re: integer multiplies on a Sparc
Keywords: integer vs. floating point
Message-ID: <1991Feb13.134057.2237@prl.dec.com>
Date: 13 Feb 91 13:40:57 GMT
References: <16864@ogicse.ogi.edu>
Sender: news@prl.dec.com (USENET News System)
Reply-To: shand@prl.dec.com (Mark Shand)
Distribution: na
Organization: Digital Equipment Corporation - Paris Research Laboratory
Lines: 43

Integer multiply on SPARC is indeed poor.  I recently added
an assembler kernel for SPARC to our bignum package and found
the fastest way to do multiprecision integer multiply was
through the FPU.  The primitive I use is 32bitx16bit->48bit which
can be computed exactly in double precision.  I've only timed it
on a SPARCstation 1 which has a rather slow 9 cycle DP mult.
The overall performance for multiprecision integer multiplies
is about 4 times less than a MIPS R2000
which has a 12-16 (depending how you count) cycle 32x32->64
integer mult, but is still faster than any other way of doing
full-word integer mult on an early SPARC.

(our bignum package is available by mail from librarian@prl.dec.com,
we will be announcing an FTP server soon)

Even on a more balanced machine like the MIPS R2000,R3000 floating
mult, although more resource intensive than integer mult, is a
higher priority operation and, through the devotion of more hardware,
takes fewer cycles.

Moral: tradeoffs between integer vs float are subtle, just because
an operation CAN be implemented more efficiently doesn't mean it
HAS BEEN.

Of course next year's CPU designers will benchmark your neural net code
that you've finally decided to cast in floats even though ints would
have served you equally well, and those designers will deprecate
integer multiply even further.

Questions:

Does anyone know which SPARC implementations include integer multiply
support beyond the multiply step instruction?  What is the opcode?
What happens if an early SPARC hits such an opcode?  Have these SPARC
implementations found their way into any product machines yet?

Another thing that bugged me about multiply step was that it doesn't
seem to give any way to get the high order part of the result.
MIPS on the contrary gives you lo and hi result registers.  This
is essential in multiprecision work.  Am I missing something in
multiply step?  Do the newer instruction help here?

Mark Shand.