Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!uakari.primate.wisc.edu!samsung!rex!ames!amdcad!nucleus!tim
From: tim@nucleus.amd.com (Tim Olson)
Newsgroups: comp.arch
Subject: Re: Integer Multiply/Divide on Sparc
Message-ID: <28635@amdcad.AMD.COM>
Date: 5 Jan 90 14:57:08 GMT
References: <84768@linus.UUCP> <8840005@hpfcso.HP.COM>
Sender: news@amdcad.AMD.COM
Reply-To: tim@amd.com (Tim Olson)
Organization: Advanced Micro Devices, Inc., Austin, Texas
Lines: 18
Summary:
Expires:
Sender:
Followup-To:

In article <8840005@hpfcso.HP.COM> dgr@hpfcso.HP.COM (Dave Roberts) writes:
| 	(4) If you need the speed, you write the code inline.  Loops kill
| 	    you in whatever architecture you use.  If you do huge numbers
| 	    of arbitrary 32x32 mults, you're code will explode, but hey,
| 	    this is a RISC machine and your code size is already through
| 	    the roof, right?  If you call a subroutine everytime you want
| 	    to do a multiply the overhead of the call will kill you.  But
| 	    notice that this wasn't what I suggested, either.

Inlining code for performing multiplies is an option, but the call
overhead isn't going to "kill you" -- the overhead would probably be
less than 10% -- especially if these kind of routines used a special
calling sequence that the compiler knows about which doesn't have the
overhead of a standard call.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)