Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!husc6!ogccse!blake!lgy From: lgy@blake.acs.washington.edu (Laurence Yaffe) Newsgroups: comp.arch Subject: Re: John von Neumann, sqrt instr Message-ID: <3312@blake.acs.washington.edu> Date: 19 Aug 89 03:20:41 GMT References: <21353@cup.portal.com> <25643@obiwan.mips.COM> <1513@l.cc.purdue.edu> <2376@wyse.wyse.com> Reply-To: lgy@newton.phys.washington.edu (Laurence Yaffe) Distribution: na Organization: University of Washington, Seattle Lines: 49 In article <2376@wyse.wyse.com> stevew@wyse.UUCP (Steve Wilson xttemp dept303) writes: - Ah, but that presumes that hardware division is warranted! - - Does the occurrence rate of divide/square root in scientific computing - justify the cost? - - How does the scientific computing community feel about this functionality? - - Steve Wilson From a user's perspective (having designed and used quite a range of scientific applications): 1) Whether or not divide or square root is done in hardware (per se) is irrelevant. What matters is the ratio of their speed to that of multiplies. 2) I'm content if divide is (typically) no more than 5 times slower than multiply. Most of my programs probably wouldn't care if divide was 100 times slower than multiply, but that's not good enough - a few of my important programs do need a better divide. 3) 64 bit floating multiply should be no more than 4 or 5 times slower than integer addition. This is critical. 4) 32 bit floating point operations are almost irrelevant, and design tradeoffs which speed up 32 operations at the price of slowing down 64 bit ops are a mistake. 5) I'm not a big user of square roots. As long as square roots cost less than 100 multiplies, I don't think I'd trade faster square roots for slower performance in other ops. 6) As always, what REALLY matters is the speed of running real programs. I find the ratio of instruction timings on MIPS machines fairly close to optimum - I wouldn't trade my M/2000 for a machine with faster integer performance but significantly slower floating point, or faster divides but slower multiplies, etc. However, I do wish that (a) integer multiplication was as fast as floating, and (b) that it was possible to tell the hardware to set floating point underflows to zero without generating a trap (which gets handled by software). When I was choosing a new machine a year ago, I down-rated Sun 4's for poor floating point performance (compared to integer), and down-rated the Apollo DN1000 for poor integer performance (compared to floating point). -- Laurence G. Yaffe Internet: lgy@newton.phys.washington.edu University of Washington Bitnet: yaffe@uwaphast.bitnet