Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!hellgate.utah.edu!cs.utexas.edu!usc!apple!sun-barr!newstop!sun!sun-bb!khb From: khb@chiba.kbierman@sun.com (Keith Bierman - SPD Advanced Languages) Newsgroups: comp.arch Subject: Re: Integer Multiply/Divide on Sparc Message-ID: Date: 15 Jan 90 19:29:39 GMT References: <8840005@hpfcso.HP.COM> <1249@otc.otca.oz> <5837@orca.wv.tek.com> <1253@otc.otca.oz> Sender: news@sun.Eng.Sun.COM Organization: Sun Microsystems Lines: 93 In-reply-to: gregw@otc.otca.oz's message of 13 Jan 90 12:17:06 GMT In article <1253@otc.otca.oz> gregw@otc.otca.oz (Greg Wilkins) writes: .... brand new sparc station X, complete with integer multipy. You now want to by some software for it: You have a choice of paying $300 for the ABI version, which cannot use the multiply instruction (which is not part of the ABI), but which is ready to run (be it very slowly as it is a multiply intensive application). ... No. As I pointed out in an earlier posting, it is possible to add instructions AND to get the benefit delievered to ABI compliant users. Consider the following gedanken experiment: ACME HiTech Corp's VP of engineering (Wily E. Coyote) concludes that some nifty mass market project of theirs (say HDTV/Workstation combo, or part of a navigation system which gets its data 1 point at a time) absolutely must have a general purpose CPU ... which can execute the following function at full hw speed f(ix,ifac) returns int(ifac*cos(x)) int(ifac*sin(x)) int(ifac*cos(-x)) int(ifac*sin(-x)) (viz. mutant given's transformation) Clearly no merchant chip (well, aside from some nifty CORDIC chips) has this now. Coyote contacts the SPARC licensing board (or its real life equivalent :>) and negoiates an opcode (perhaps in supervisor space ... where it doesn't affect anyone else) to be available only in their chips/systems (I have no idea how this is done, but assume for money it can be arranged). Consider the following cases: 1) ACME engineers build their software on Sun 4/60's which lack the instruction. 2) First spin of the chip can't fit the whole thing ... so they compute sincos(x) in hw. 3) Second spin does it all .. but has a bug which requires that arg reduction be performed before the rest of the instruction. 4) Third spin does it all ... correctly. Quiz: 1) How many a.out's (to use SunOS 4.x lingo) are necessary ? 2) How many can be ABI compliant ? Answers: 1) Only 1 a.out is required .. and can benefit from the hw. 2) It can be ABI compliant. 3) Yes, being non-ABI compliant might improve performance. The solution is via shared libraries. The a.out only knows that at runtime there will be a routine called wily_givens. At runtime the runtime loader links in the "right" shared library ... where right means the one which matches the local hardware. 1 ABI compliant a.out runs on 4, and goes faster as the hw improves. Best performance (probably 1-5% faster in this case) would be obtained by generating the wily_givens instruction directly in the 4th case. If it were to happen that SPARC's with wily_givens caputured a huge chunk of the market (say 90% for grins) perhaps the ABI would be altered .... causing the 10% to trap to the OS for emulation. This would only adversely impacted the 10% of user's who lacked the hardware, but employed codes rich in the instruction .... most of the 10% probably wouldn't notice or care. Clearly integer multiply would be somewhat more popular than wily_givens and is less computationally intense .... thus the tradeoffs aren't so obvious. But the general case of how to add an instruction, yet be ABI compliant, is easily handled. -- Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com It's Not My Fault | MTS --Only my work belongs to Sun* I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO" "There is NO defense against the attack of the KILLER MICROS!" Eugene Brooks