Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!rpi!crdgw1!uunet!shelby!agate!ucbvax!IBM.COM!JBS
From: JBS@IBM.COM
Newsgroups: comp.arch
Subject: bizarre instructions
Message-ID: <9102220245.AA14853@ucbvax.Berkeley.EDU>
Date: 22 Feb 91 00:05:55 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Lines: 76


          Regarding my comment:
          Since argument reduction for the trigonometric functions is
done in practice by multiplying by 1/pi I do not see how it would ben-
efit from your proposed instruction.
          Dik Winter says:
You must have a very limited experience.  You do it only by multiplying by
1/pi if you are willing to forego a lot of precision.  (Check your
favorite hand-held calculator.  If sin(pi) = 0, the implementation is
wrong!)

          I don't understand this comment.  This problem can occur
with the divide with remainder instruction as well.  The problem is
that approximating x mod pi by x mod y where y is the machine rep-
resentation of pi may lose all accuracy.
          Herman Rubin says:
There is somewhat greater loss of accuracy in this, and it is still
needed to extract the integer part to an integer register and the
fractional part.  Thus, it needs at least three operations, instead
of one.  Also, if one is calculating something like x - ln(1+x), a
natural operation in certain problems, the computing problems become
a little larger than one would expect.  In fact, to avoid a loss of
accuracy in some quite usual situations, it would even be a good idea
to have Boolean operations on floats, and unnormalized floating
arithmetic.  Floating arithmetic, apart from some preliminaries,
and normalization problems, is exactly the same as integer, so why
should there be a separate arithmetic unit even?

         I don't understand the comments about loss of accuracy.
As noted above some or all of the bits of the remainder will be
bogus because pi is not a machine number.  Accurate argument re-
duction requires first finding the integer part, n, of the quotient
then computing x-n*pi carefully using pi to more than machine pre-
cision.  It is more convenient to do this if n is in floating for-
mat.  Counting operations is a poor test of speed when floating
divide typically takes 5 times or more as long as floating mult-
iply.  I don't understand the reference to x-ln(1+x).  This will
always lose accuracy if evaluated in this form for x near 0.  It
is usually possible to perform boolean operations on floats al-
though it may be a little awkward.  I presume the reason for a
separate floating point unit is that this leads to faster machines.
         Regarding my suggestion for an integer*8 type Herman Rubin
comments:
Yes and no.  It would have much more general utility, but it would
do an abysmally inefficient job in this situation.  You would need
to have a way of indicating that the product of two integers*4 is
an integer*8, which I do not know how to do in any language with
which I am familiar without writing it as a function, and I do not
think that one should have to write mult48(a,b) instead of a*b.  In
addition, how would you write the operation which, when applied to
a*b+c and n, yields both q and r?

         Regarding a function to get a the 8 byte product of 2
4 byte integers I don't see why this is any worse than Montgomery's
function.  In any case it is not needed.  Let i4,j4 be 4 byte
i8,j8,k8 be 8 byte.  Then write
         i8=i4
         j8=j4
         k8=i8*j8
A sufficiently intelligent compiler will do the right thing.  It may
work to write
         k8=i4*j4
This sometimes works in the analogous case where k8 is 4 byte and i4
and j4 are 2 bytes.  However I am not sure strictly speaking that it
should.  What does the Fortran standard say?
         As for getting both q and r this is easy just write
         iq=ia/ib
         ir=mod(ia,ib)
A reasonable compiler will only generate 1 divide with remainder.
I will confess however that while this works when everything is
4 bytes I don't quite see how to make it work when ia is 8 bytes
(since it seems unreasonable to define the quotient of a 8 byte
integer by a 4 byte integer to be 4 bytes and if it is defined
to be 8 bytes it is then unsafe to use the usual 8 byte by 4 byte
giving 4 byte quotient instruction).
                    James B. Shearer


Brought to you by Super Global Mega Corp .com