Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbvax!WATSON.IBM.COM!jbs From: jbs@WATSON.IBM.COM Newsgroups: comp.arch Subject: float/float = integer, remainder Message-ID: <9105150230.AA26299@ucbvax.Berkeley.EDU> Date: 15 May 91 00:14:45 GMT Sender: daemon@ucbvax.BERKELEY.EDU Lines: 57 Peter Montgomery gives an example (computing sin(1234.) in 4 digit decimal) which purports to show how to use a float/float -> integer, remainder instruction for argument reduction. There are some problems with his method (as I will discuss later). In any case it is besides the point. I am not claiming you can not use this instruction for argument reduction. I am claiming it will not enable you to do argument reduction any faster. This is because it can be expected to run slower than floating divide. Floating divides are so slow on most recent machines (as compared to floating multiplies or adds) that it pays to go to great lengths to avoid them. In this case this is easy to do as we are dividing by a constant so we can multiply by the reciprocal instead. Consider for example how argument reduction is done for the exp routine (as we will see this is the easier case). We are using the identity exp(x)=exp(n*log(2)+(x-n*log(2))=(2**n)*exp(y) where y=x-n* log(2). We want y as near to 0 as possible so we choose n to be the nearest integer to x/log(2). We will compute exp(y) by using a mini- max polynomial approximation on the interval (-.5*ln(2),.5*ln(2)). If we actually use a minimax approximation valid on a slightly bigger in- terval (say (-.35,.35)) than we can afford to be slightly sloppy in choosing n. Hence we take n to be the nearest integer to x*(1./log(2)). Now we must compute x-n*log(2) without loss of accuracy due to cancell- ation. In this case this is easy. Note n is at most 11 bits long (else exp(x) will overflow or underflow; we are assuming IEEE 64-bit arithme- tic). So write ln(2)=z1+z2 where z1 contains the first 42 bits of ln(2) and z2 the next 53. Then y=(x-n*z1)-n*z2 computes y accurately. Also this entire sequence will be faster than a single floating divide on either a 3090 mainframe or a Risc System 6000. For trigonometric functions we let n be the closest integer to x/(pi/2) and y=x-n*(pi/2). The computation of n and y is trickier for reasons: 1) n may be very large. 2) y must be computed with small relative error not just small absolute error (since sin has a zero at zero). Peter Montgomery's method does not deal with these problems. Since the quotient is returned as an integer it will overflow whenever n>2**31. His example assumes his integers have as many digits as his floats. If the integers were 2 digits long then 785 would overflow. His example also assumes 8 digits of pi/2 suffice for accurate argument argument reduction of 4 digit numbers. This is not true as can be seen by trying sin(355.). Finally in computing 785*pi2, 785 must be convert- ed back to floating point so you still have conversions. Problem 1 can be dealt with by requiring abs(x) to be not too large. Bigger arguments can be forbidden or handled by alternative slower code. Since almost all inputs to the trig functions lie near zero this will not affect the average performance. Problem 2 can be dealt with be dividing pi/2 into more than 2 parts. The precision of pi/2 required increases as allowed values of n increase. Note if you have 3 parts pi1, pi2 and pi3 you must compute t-n*pi2 accurately as well as t=x-n*pi1. The general problem of computing x-y*z accurately even when y*z is near x can handled in several ways without the new instruction. On machines such as the Risc System 6000 with a fused multiply-add there is no problem, the multiply-add instruction does the right thing. On ma- chines that can return the low part of y*z you just subtract the high part first, then the low part. James B. Shearer