Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!uakari.primate.wisc.edu!brutus.cs.uiuc.edu!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!uxa.cso.uiuc.edu!afgg6490 From: afgg6490@uxa.cso.uiuc.edu Newsgroups: comp.arch Subject: Re: Intel 860 Architecture Message-ID: <112400013@uxa.cso.uiuc.edu> Date: 12 Dec 89 06:22:00 GMT References: <3818@convex.UUCP> Lines: 49 Nf-ID: #R:convex.UUCP:3818:uxa.cso.uiuc.edu:112400013:000:2229 Nf-From: uxa.cso.uiuc.edu!afgg6490 Dec 12 00:22:00 1989 ..> Reciprocal approximation Extracting from the comparative tables in my recently completed survey of computer arithmetic (yeah, I know, I wish they were much more complete too): Cray machines use Newton-Raphson reciprocal approximation. Reciprocal approximation was designed into the Advanced Astronautics ZS-1, but was replaced in the final design by commodity chips. IEEE format, but not IEEE accuracy. This algorithms special feature was ommitting mantissa bits that were fixed in value in the intermediate stages to obtain a 1/x approximately 20% more accurate that Cray's algorithm. Reciprocal approximation was used in the Gould NP1, with hexadecimal floating point. Accuracy problems (in large part caused by the hex format) caused a real divide to be backpatched in. The AT&T DSP32C uses NR with a seed instruction and SW iteration for its special 40 bit FP format. Motorola MC960002 also uses Newton-Raphson with a seed instruction. (Cyrix, as mentioned in an earlier post, uses NR to compute an approximate 1/X for use in its 17 bit digit selection divide). COMMENTS -------- Use of N-R reciprocal algorithms was set back quite a bit by IEEE FP's exactness requirements: a true remainder is needed for correct rounding. Also, in non-binary floating point (decimal or hex) accuracy problems when division is implemented by reciprocal multiplication can be extreme. Note that NR reciprocals typically converge from below. Most other problems with reciprocal are caused by laziness - implementations that do not bring the result out to the last possible bit. With a fast enough multiplier, however, the true remainder can be computed with only a little extra work (for Y/X: Q=Y*(1/X); R=Y-Q*X; QQ = Q +/- delta dependeng on comparison of R and X). I have already heard compiler guys say that reciprocal instructions opened up possibilities for optimization that are hidden by the assymmetry of divide. I see some possibility of using both an explicit reciprocal and a remainder correction to advantage, as separable operations: (1) a reciprocal instruction; (2) a remainder correction instruction to acheive a true divide result. (2) Can be optimized away, or combined when redundant.