Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!wuarchive!brutus.cs.uiuc.edu!ux1.cso.uiuc.edu!iuvax!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.arch Subject: Re: Integer Multiply/Divide on Sparc Summary: Arithmetic subroutines are orders of magnitude slower than hardware Message-ID: <1804@l.cc.purdue.edu> Date: 27 Dec 89 12:51:36 GMT References: <84768@linus.UUCP> <8840004@hpfcso.HP.COM> Organization: Purdue University Statistics Department Lines: 63 In article <8840004@hpfcso.HP.COM>, dgr@hpfcso.HP.COM (Dave Roberts) writes: > > >The SPARC is brain dead [as were its designers] when it comes to doing > >integer arithmetic. It can't multiply and it can't divide. > > >-- > >Bob Silverman > >#include > >Internet: bs@linus.mitre.org; UUCP: {decvax,philabs}!linus!bs > >Mitre Corporation, Bedford, MA 01730 > >---------- > > > Geeze Bob, > The thing is a SPARC. It's a RISC machine. Integer mult and > divide are the first things to go when you design a RISC. There should > be some funky instructions to help you out, like "shift and add" for > multiplication. Trust me, you're better off (in speed, that is) for ^^^^^^^^ > not having those functions, and I'll be that you can write a routine > that can do them just about as fast as they could internally. > I don't really know much about SPARCs but I know that the designers > at Sun weren't "brain dead". It is clear that you are not to be trusted (see above). To multiply two 32 bit numbers to get a 64 bit product on a 32x32 -> 32 machine, the 32 bit numbers must be divided into 16 bit parts. The whole operation takes about 20 operations (count them). Shift and add are far slower. Divide is even worse. Also, there is considerable overhead in a subroutine call; there are registers to save and restore. Open subroutines (in-line functions) are a way around it, but they still have the problem. I am sure that Bob Silverman knows how to write efficient subroutines. He has to use them anyhow, as he is multiplying and dividing numbers of several hundred bits. But even if less is wanted, good integer arithmetic is needed. If more precision than is designed for is wanted in floating operations, integer arithmetic must be used. There are also many other kinds of operations cheap in hardware and expensive in software. RISC machines may be good for the types of operations the designers anticipated, but it is difficult to do much about the ones left out. The CRAYs can be considered RISC vector machines, and the vector operations omitted are extremely difficult to get around. The above instruction count for double precision was derived from the CRAY. We even have a chicken-and-egg problem. Any fairly good programmer designs the program to take into account the capabilities of the machine. I know that the gurus claim that this should not be so, but it is not unusual for me to think of modifications or even totally new ways of doing things which the compiler cannot unless those specific ways are put into the compiler. If a machine does not have hardware square roots, one avoids square roots, as there are usually faster ways. One thing which might help is if there were a mailing list to discuss these ideas, and to collect the numerous operations efficient in hardware and expensive in software. Those who know me will agree that I am not the person to run this. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)