Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbvax!WATSON.IBM.COM!jbs
From: jbs@WATSON.IBM.COM
Newsgroups: comp.arch
Subject: float/float = integer, remainder
Message-ID: <9105150230.AA26299@ucbvax.Berkeley.EDU>
Date: 15 May 91 00:14:45 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Lines: 57


         Peter Montgomery gives an example (computing sin(1234.) in
4 digit decimal) which purports to show how to use a float/float ->
integer, remainder instruction for argument reduction.  There are some
problems with his method (as I will discuss later).  In any case it is
besides the point.  I am not claiming you can not use this instruction
for argument reduction.  I am claiming it will not enable you to do
argument reduction any faster.  This is because it can be expected to run
slower than floating divide.  Floating divides are so slow on most recent
machines (as compared to floating multiplies or adds) that it pays to go
to great lengths to avoid them.  In this case this is easy to do as we
are dividing by a constant so we can multiply by the reciprocal instead.
         Consider for example how argument reduction is done for the
exp routine (as we will see this is the easier case).  We are using the
identity exp(x)=exp(n*log(2)+(x-n*log(2))=(2**n)*exp(y) where y=x-n*
log(2).  We want y as near to 0 as possible so we choose n to be the
nearest integer to x/log(2).  We will compute exp(y) by using a mini-
max polynomial approximation on the interval (-.5*ln(2),.5*ln(2)).  If
we actually use a minimax approximation valid on a slightly bigger in-
terval (say (-.35,.35)) than we can afford to be slightly sloppy in
choosing n.  Hence we take n to be the nearest integer to x*(1./log(2)).
Now we must compute x-n*log(2) without loss of accuracy due to cancell-
ation.  In this case this is easy.  Note n is at most 11 bits long (else
exp(x) will overflow or underflow; we are assuming IEEE 64-bit arithme-
tic). So write ln(2)=z1+z2 where z1 contains the first 42 bits of ln(2)
and z2 the next 53.  Then y=(x-n*z1)-n*z2 computes y accurately.  Also
this entire sequence will be faster than a single floating divide on
either a 3090 mainframe or a Risc System 6000.
         For trigonometric functions we let n be the closest integer to
x/(pi/2) and y=x-n*(pi/2).  The computation of n and y is trickier for
reasons:
         1)  n may be very large.
         2)  y must be computed with small relative error not just small
absolute error (since sin has a zero at zero).
         Peter Montgomery's method does not deal with these problems.
Since the quotient is returned as an integer it will overflow whenever
n>2**31.  His example assumes his integers have as many digits as his
floats.  If the integers were 2 digits long then 785 would overflow.
His example also assumes 8 digits of pi/2 suffice for accurate argument
argument reduction of 4 digit numbers.  This is not true as can be seen
by trying sin(355.).  Finally in computing 785*pi2, 785 must be convert-
ed back to floating point so you still have conversions.
         Problem 1 can be dealt with by requiring abs(x) to be not too
large.  Bigger arguments can be forbidden or handled by alternative
slower code.  Since almost all inputs to the trig functions lie near zero
this will not affect the average performance.
         Problem 2 can be dealt with be dividing pi/2 into more than 2
parts.  The precision of pi/2 required increases as allowed values of n
increase.  Note if you have 3 parts pi1, pi2 and pi3 you must compute
t-n*pi2 accurately as well as t=x-n*pi1.
         The general problem of computing x-y*z accurately even when y*z
is near x can handled in several ways without the new instruction.  On
machines such as the Risc System 6000 with a fused multiply-add there is
no problem, the multiply-add instruction does the right thing.  On ma-
chines that can return the low part of y*z you just subtract the high
part first, then the low part.
                          James B. Shearer