Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!rpi!zaphod.mps.ohio-state.edu!tut.cis.ohio-state.edu!ucbvax!hplabs!hpfcso!dgr
From: dgr@hpfcso.HP.COM (Dave Roberts)
Newsgroups: comp.arch
Subject: Re: Integer Multiply/Divide on Sparc
Message-ID: <8840005@hpfcso.HP.COM>
Date: 4 Jan 90 20:58:06 GMT
References: <84768@linus.UUCP>
Organization: Hewlett-Packard, Fort Collins, CO, USA
Lines: 56


Sorry Guys and Gals,
	I didn't intend to start a pounce on RISC thread.  When I answered
Bob's question it was intended to show him that there was a way to do
multiply and divide.
	Now for some comments:
	(1) SPARCs will get multiply and divide.  This is from a guy at
	    Sun.  Coming soon to a SPARC station near you...
	(2) By suggesting that Bob was "much better off" (unclear on my
	    part) I didn't mean to suggest that he was going to get steller
	    integer performance all the time.  Rather, in general, his
	    whole program should run faster.  I guess it didn't, but then
	    again I didn't look at the code.
	(3) As some have pointed out, the reason for removing those
	    instructions from a RISC architecture is because *most* programs
	    don't do a whole lot of multiplications between arbitrary 32 bit
	    integers.  Usually it is an arbitrary integer and a known (though
	    not necessarily small) integer constant.  With the known constant
	    you can reduce the mult to a known sequence of shift and adds,
	    which a good compiler will do (in fact, many CISC machines would
	    run faster if the compilers would do this for them also instead
	    of just inserting a XX cycle multiply instruction).
	(4) If you need the speed, you write the code inline.  Loops kill
	    you in whatever architecture you use.  If you do huge numbers
	    of arbitrary 32x32 mults, you're code will explode, but hey,
	    this is a RISC machine and your code size is already through
	    the roof, right?  If you call a subroutine everytime you want
	    to do a multiply the overhead of the call will kill you.  But
	    notice that this wasn't what I suggested, either.
	(5) The original point was that most programs don't need the kind
	    of integer numerical performance that, I guess, Bob's does,
	    and in general the shift and adds (for computing things like
	    array indices and so forth) are just fine.
	    It's a (semi)pathological case in the whole universe of computer
	    programs.  As a user who doesn't generate programs like that,
	    I'd rather all the other instructions be speeded up a bit by
	    allowing higher clock speeds, etc.  And most users don't generate
	    or use programs like that.
	(6) If you really need the blazing integer speed, buy a coprocessor.
	    That is also one of the fundemental RISC ideas.  There are times
	    when things just aren't done well by software and do need hardware
	    help.  This option also allows you to get *really, really* fast
	    integer speed by using a multiplier array (works by generating
	    all the product terms all at once and then adding the whole
	    sh'bang together.  It's fast as hell but it uses ton's of chip
	    area.  Perfect for a coprocessor).  Someone else pointed this
	    out a few postings back (in the DSP entries, I think).  Sure it
	    costs more for this, but I'd rather save the cost when I don't
	    need it.  (Remember that floating point is also a coprocessor.
	    Only naivity would hold that interger operations can't be also.)


Dave Roberts
Hewlett-Packard Co.
dgr@hpfcla.hp.com