Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ncar!husc6!ogccse!blake!lgy
From: lgy@blake.acs.washington.edu (Laurence Yaffe)
Newsgroups: comp.arch
Subject: Re: John von Neumann, sqrt instr
Message-ID: <3312@blake.acs.washington.edu>
Date: 19 Aug 89 03:20:41 GMT
References: <21353@cup.portal.com> <25643@obiwan.mips.COM> <1513@l.cc.purdue.edu> <2376@wyse.wyse.com>
Reply-To: lgy@newton.phys.washington.edu (Laurence Yaffe)
Distribution: na
Organization: University of Washington, Seattle
Lines: 49

In article <2376@wyse.wyse.com> stevew@wyse.UUCP (Steve Wilson xttemp dept303) writes:

- Ah, but that presumes that hardware division is warranted!
- 
- Does the occurrence rate of divide/square root in scientific computing
- justify the cost?
- 
- How does the scientific computing community feel about this functionality?
- 
- Steve Wilson

    From a user's perspective (having designed and used quite a range
of scientific applications):

1) Whether or not divide or square root is done in hardware (per se) is
   irrelevant.  What matters is the ratio of their speed to that of multiplies.

2) I'm content if divide is (typically) no more than 5 times slower
   than multiply.  Most of my programs probably wouldn't care if
   divide was 100 times slower than multiply, but that's not good
   enough - a few of my important programs do need a better divide.

3) 64 bit floating multiply should be no more than 4 or 5 times slower than
   integer addition.  This is critical.

4) 32 bit floating point operations are almost irrelevant, and design
   tradeoffs which speed up 32 operations at the price of slowing down
   64 bit ops are a mistake.

5) I'm not a big user of square roots.  As long as square roots cost
   less than 100 multiplies, I don't think I'd trade faster square roots
   for slower performance in other ops.

6) As always, what REALLY matters is the speed of running real programs.

I find the ratio of instruction timings on MIPS machines fairly close to
optimum - I wouldn't trade my M/2000 for a machine with faster integer
performance but significantly slower floating point, or faster divides but
slower multiplies, etc.  However, I do wish that (a) integer multiplication
was as fast as floating, and (b) that it was possible to tell the hardware
to set floating point underflows to zero without generating a trap (which
gets handled by software).  When I was choosing a new machine a year ago,
I down-rated Sun 4's for poor floating point performance (compared to integer),
and down-rated the Apollo DN1000 for poor integer performance (compared to
floating point).

-- 
Laurence G. Yaffe		Internet: lgy@newton.phys.washington.edu
University of Washington	  Bitnet: yaffe@uwaphast.bitnet