Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!rpi!zaphod.mps.ohio-state.edu!tut.cis.ohio-state.edu!ucbvax!hplabs!hpfcso!dgr From: dgr@hpfcso.HP.COM (Dave Roberts) Newsgroups: comp.arch Subject: Re: Integer Multiply/Divide on Sparc Message-ID: <8840005@hpfcso.HP.COM> Date: 4 Jan 90 20:58:06 GMT References: <84768@linus.UUCP> Organization: Hewlett-Packard, Fort Collins, CO, USA Lines: 56 Sorry Guys and Gals, I didn't intend to start a pounce on RISC thread. When I answered Bob's question it was intended to show him that there was a way to do multiply and divide. Now for some comments: (1) SPARCs will get multiply and divide. This is from a guy at Sun. Coming soon to a SPARC station near you... (2) By suggesting that Bob was "much better off" (unclear on my part) I didn't mean to suggest that he was going to get steller integer performance all the time. Rather, in general, his whole program should run faster. I guess it didn't, but then again I didn't look at the code. (3) As some have pointed out, the reason for removing those instructions from a RISC architecture is because *most* programs don't do a whole lot of multiplications between arbitrary 32 bit integers. Usually it is an arbitrary integer and a known (though not necessarily small) integer constant. With the known constant you can reduce the mult to a known sequence of shift and adds, which a good compiler will do (in fact, many CISC machines would run faster if the compilers would do this for them also instead of just inserting a XX cycle multiply instruction). (4) If you need the speed, you write the code inline. Loops kill you in whatever architecture you use. If you do huge numbers of arbitrary 32x32 mults, you're code will explode, but hey, this is a RISC machine and your code size is already through the roof, right? If you call a subroutine everytime you want to do a multiply the overhead of the call will kill you. But notice that this wasn't what I suggested, either. (5) The original point was that most programs don't need the kind of integer numerical performance that, I guess, Bob's does, and in general the shift and adds (for computing things like array indices and so forth) are just fine. It's a (semi)pathological case in the whole universe of computer programs. As a user who doesn't generate programs like that, I'd rather all the other instructions be speeded up a bit by allowing higher clock speeds, etc. And most users don't generate or use programs like that. (6) If you really need the blazing integer speed, buy a coprocessor. That is also one of the fundemental RISC ideas. There are times when things just aren't done well by software and do need hardware help. This option also allows you to get *really, really* fast integer speed by using a multiplier array (works by generating all the product terms all at once and then adding the whole sh'bang together. It's fast as hell but it uses ton's of chip area. Perfect for a coprocessor). Someone else pointed this out a few postings back (in the DSP entries, I think). Sure it costs more for this, but I'd rather save the cost when I don't need it. (Remember that floating point is also a coprocessor. Only naivity would hold that interger operations can't be also.) Dave Roberts Hewlett-Packard Co. dgr@hpfcla.hp.com