Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!uakari.primate.wisc.edu!brutus.cs.uiuc.edu!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!uxa.cso.uiuc.edu!afgg6490
From: afgg6490@uxa.cso.uiuc.edu
Newsgroups: comp.arch
Subject: Re: Intel 860 Architecture
Message-ID: <112400013@uxa.cso.uiuc.edu>
Date: 12 Dec 89 06:22:00 GMT
References: <3818@convex.UUCP>
Lines: 49
Nf-ID: #R:convex.UUCP:3818:uxa.cso.uiuc.edu:112400013:000:2229
Nf-From: uxa.cso.uiuc.edu!afgg6490    Dec 12 00:22:00 1989


..> Reciprocal approximation

Extracting from the comparative tables in my recently completed
survey of computer arithmetic (yeah, I know, I wish they were much more
complete too):

Cray machines use Newton-Raphson reciprocal approximation.

Reciprocal approximation was designed into the Advanced Astronautics
ZS-1, but was replaced in the final design by commodity chips. 
IEEE format, but not IEEE accuracy. This algorithms special feature
was ommitting mantissa bits that were fixed in value in the intermediate
stages to obtain a 1/x approximately 20% more accurate that Cray's algorithm.

Reciprocal approximation was used in the Gould NP1, with hexadecimal
floating point.  Accuracy problems (in large part caused by the
hex format) caused a real divide to be backpatched in.

The AT&T DSP32C uses NR with a seed instruction and SW iteration for
its special 40 bit FP format.

Motorola MC960002 also uses Newton-Raphson with a seed instruction.

(Cyrix, as mentioned in an earlier post, uses NR to compute an
approximate 1/X for use in its 17 bit digit selection divide).


COMMENTS
--------
    Use of N-R reciprocal algorithms was set back quite a bit by
IEEE FP's exactness requirements: a true remainder is needed for
correct rounding.   Also, in non-binary floating point (decimal or hex)
accuracy problems when division is implemented by reciprocal multiplication
can be extreme.   Note that NR reciprocals typically converge from below.
Most other problems with reciprocal are caused by laziness - implementations
that do not bring the result out to the last possible bit.
    With a fast enough multiplier, however, the true remainder can be
computed with only a little extra work (for Y/X:  Q=Y*(1/X); R=Y-Q*X;
QQ = Q +/- delta dependeng on comparison of R and X).
    I have already heard compiler guys say that reciprocal instructions
opened up possibilities for optimization that are hidden by the
assymmetry of divide.
    I see some possibility of using both an explicit reciprocal and a 
remainder correction to advantage, as separable operations:
(1) a reciprocal instruction; (2) a remainder correction instruction to acheive
a true divide result.  (2) Can be optimized away, or combined when
redundant.