Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!crdgw1!crdos1!davidsen From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) Newsgroups: comp.arch Subject: Re: Integer Multiply/Divide on Sparc Message-ID: <2017@crdos1.crd.ge.COM> Date: 16 Jan 90 13:31:57 GMT References: <8840005@hpfcso.HP.COM> <1249@otc.otca.oz> <1255@otc.otca.oz> <2819@auspex.auspex.com> Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen) Organization: GE Corp R&D Center, Schenectady NY Lines: 26 In article <2819@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: | Newer compilers will presumably include a command-line flag instructing | them to either produce the multiply/divide instructions themselves or | calls to ".mul"/".div" and company. And such calls are surely *not* | expanded with the "a.out" file is generated, unless you linked with | "-Bstatic" - shared libraries, remember? Has anyone measured the time taken to just generate the mpy and trap it vs the time for a procedure call? We used to trap some instructions on the old GE series 20 years ago, and the time to trap and decode (table lookup for decode) was only a few % slower than a call, when the total time to execute the "instruction" was taken into account. Would it be better to just generate the instruction all the time and trap it, rather than use the various libraries? It would certainly give better performance on the machines with the mpy hardware, and based on the very slow times reported here might not be a notable loss on standard ABI SPARC. Has anyone measured these numbers to get a ballpark figure? I don't have a good feel for how long the partial context change would take on the trap. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Stupidity, like virtue, is its own reward" -me