Path: utzoo!utgpu!attcan!uunet!yale!mfci!colwell
From: colwell@mfci.UUCP (Robert Colwell)
Newsgroups: comp.arch
Subject: HW sqrt/div (was RISC v. CISC --more misconceptions)
Message-ID: <544@m3.mfci.UUCP>
Date: 3 Nov 88 13:41:50 GMT
References: <156@gloom.UUCP> <18931@apple.Apple.COM> <40@sopwith.UUCP> <19762@apple.Apple.COM> <1002@l.cc.purdue.edu> <19811@apple.Apple.COM>
Sender: colwell@mfci.UUCP
Reply-To: colwell@mfci.UUCP (Robert Colwell)
Organization: Multiflow Computer Inc., Branford Ct. 06405
Lines: 21

In article <19811@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
>Square root is the same category as divide. Hardware is slow, so algorithms
>tend to avoid them. The reason is fundamental. The hardware is slow, and it
>is exceedingly difficult to make it faster. Strangly enough, floating point
>divide can be made to run much faster, because of its normalized operands.

One of the biggest problems with hardware sqrt/divide is that their 
hardware implementations want to be iterative, which makes these
ops non-pipeline-able.  That's a very bad feature in machines where
all other arithmetic ops, esp. flt. pt. multiply/adds are pipelined.
A software implementation of sqrt or div uses the pipelined ops, so
the net effect is that the latency of a single op will be higher,
but the net throughput is much better.  Of course, the hardware can
get you the last bit correctly rounded to IEEE specifications; the
software could too, in principle, but I've not yet seen anyone do
it.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090