Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!wuarchive!brutus.cs.uiuc.edu!ux1.cso.uiuc.edu!iuvax!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: comp.arch
Subject: Re: Integer Multiply/Divide on Sparc
Summary: Arithmetic subroutines are orders of magnitude slower than
	 hardware
Message-ID: <1804@l.cc.purdue.edu>
Date: 27 Dec 89 12:51:36 GMT
References: <84768@linus.UUCP> <8840004@hpfcso.HP.COM>
Organization: Purdue University Statistics Department
Lines: 63

In article <8840004@hpfcso.HP.COM>, dgr@hpfcso.HP.COM (Dave Roberts) writes:
> 
> >The SPARC is brain dead [as were its designers] when it comes to doing
> >integer arithmetic. It can't multiply and it can't divide.
> 
> >-- 
> >Bob Silverman
> >#include <std.disclaimer>
> >Internet: bs@linus.mitre.org; UUCP: {decvax,philabs}!linus!bs
> >Mitre Corporation, Bedford, MA 01730
> >----------
> 
> 
> Geeze Bob,
> 	The thing is a SPARC.  It's a RISC machine.  Integer mult and
> divide are the first things to go when you design a RISC.  There should
> be some funky instructions to help you out, like "shift and add" for
> multiplication.  Trust me, you're better off (in speed, that is) for
                   ^^^^^^^^
> not having those functions, and I'll be that you can write a routine
> that can do them just about as fast as they could internally.
> I don't really know much about SPARCs but I know that the designers
> at Sun weren't "brain dead".

It is clear that you are not to be trusted (see above).  To multiply
two 32 bit numbers to get a 64 bit product on a 32x32 -> 32 machine,
the 32 bit numbers must be divided into 16 bit parts.  The whole operation
takes about 20 operations (count them).  Shift and add are far slower.
Divide is even worse.   Also, there is considerable overhead in a
subroutine call; there are registers to save and restore.  Open
subroutines (in-line functions) are a way around it, but they still
have the problem.

I am sure that Bob Silverman knows how to write efficient subroutines.
He has to use them anyhow, as he is multiplying and dividing numbers of
several hundred bits.  But even if less is wanted, good integer arithmetic
is needed.  If more precision than is designed for is wanted in floating
operations, integer arithmetic must be used.

There are also many other kinds of operations cheap in hardware and
expensive in software.  RISC machines may be good for the types of
operations the designers anticipated, but it is difficult to do much
about the ones left out.  The CRAYs can be considered RISC vector 
machines, and the vector operations omitted are extremely difficult
to get around.  The above instruction count for double precision was
derived from the CRAY.

We even have a chicken-and-egg problem.  Any fairly good programmer
designs the program to take into account the capabilities of the machine.
I know that the gurus claim that this should not be so, but it is not
unusual for me to think of modifications or even totally new ways of
doing things which the compiler cannot unless those specific ways are
put into the compiler.  If a machine does not have hardware square roots,
one avoids square roots, as there are usually faster ways.

One thing which might help is if there were a mailing list to discuss these
ideas, and to collect the numerous operations efficient in hardware and
expensive in software.  Those who know me will agree that I am not the
person to run this.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)