Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!auspex!guy From: guy@auspex.auspex.com (Guy Harris) Newsgroups: comp.arch Subject: Re: Integer Multiply/Divide on Sparc Message-ID: <2819@auspex.auspex.com> Date: 15 Jan 90 20:50:08 GMT References: <8840005@hpfcso.HP.COM> <1249@otc.otca.oz> <1255@otc.otca.oz> Organization: Auspex Systems, Santa Clara Lines: 82 >But assuming that newer compilers will generate them, surely they are >expanded when the a.out file is generated, so they cannot be evaluated at >load time. Newer compilers will presumably include a command-line flag instructing them to either produce the multiply/divide instructions themselves or calls to ".mul"/".div" and company. And such calls are surely *not* expanded with the "a.out" file is generated, unless you linked with "-Bstatic" - shared libraries, remember? >I guess they can be expanded to a call to a shared library routine (I >don't know how this mechanism works), so libC.a is not linked in until >run time, Exactly. >But I don't know if shared libraries are included in the ABI. If the ABI is anything like the ones for which I've seen drafts, not only are they included, they are *required* - i.e., the way you do a "stat()" call in an ABI-conforming application is you make a call to the dynamically-linked routine "stat()" in the appropriate library, passing it certain arguments. You don't shove specified stuff into registers and execute trap # N. The same could apply to ".mul"/".div", and probably *would* apply (the drafts I saw hadn't gotten around to specifying those particular routines yet; they came in the processor-specific part of the ABI). (It would also apply to, e.g. "getpwnam()" and company, so ABI implementations will pick up the local "getpwname()"-and-company implementation, whether it be a linear scan through "/etc/passwd", a "dbm"-based implementation like 4.3BSD, a Hesiod-based implementation, an HPollo Registry-based implementation, a YP-based implementation, etc..) >Well lets assume that via some mechanism, multiplies are performed by an >undefined function that is linked in at load time. Then the best you can do >is cop a function call then a multiply instruction (possibly with moves to >and from a co processor). I guess this is not too bad as function calls are >pretty fast on a SPARC anyway. Yes, if you want an ABI-conforming program. If speed, rather than shrink-wrap portability, is important, you could build the program with whatever the "generate multiply/divide in line" flag is. If both are important, either: 1) portability to *all* SPARC-based machines *isn't* important (e.g., an application that won't work fast enough if you don't have multiply/divide instructions), in which case you might be able to build your program with the "generate multiply/divide in line" flag, and label it "this should run on any ABI-conforming machine that *also* has the standard multiply/divide instructions". 2) portability to all SPARC-based machines is important, but so is getting the extra performance from the instructions if they're there, in which case you may build two versions. These do somewhat conflict with the principle of the ABI, but life isn't always perfect (are there PC applications that demand floating-point hardware, or applications that come in multiple versions, one of which does and one of which doesn't?). >But what about multiplies by constants, the compiler will have turned these >into wonderful sequences of shifts and adds. If a mul instruction became >available, these should be replaced by a load with constant followed by a >multiply. Why? Do you have hard evidence (not guesses) to suggest that a load with constant followed by a multiply will be faster than the sequence of shifts and adds? Note that the Sun 68020 compiler generates sequences of shifts and adds for multiplies by constants *even though the 68020 has a 32x32 multiply instruction*; I don't think this was done because the compiler writers were too lazy to change the compiler, I think it was done because it was *still* faster to do the shifts and adds. I can't speak for MIPS, but I wouldn't be surprised to hear that even though they had a multiply instruction since Day 1 they still did shifts and adds for multiplies by constants.