Path: utzoo!attcan!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!uakari.primate.wisc.edu!uflorida!haven!uvaarpa!hudson!astsun8.astro.Virginia.EDU!cak3g
From: cak3g@astsun8.astro.Virginia.EDU (Colin Klipsch)
Newsgroups: comp.sys.mac.programmer
Subject: How to Cosine
Message-ID: <2157@hudson.acc.virginia.edu>
Date: 23 Oct 89 04:20:43 GMT
Sender: news@hudson.acc.virginia.edu
Reply-To: cak3g@astsun8.astro.Virginia.EDU (Colin Klipsch)
Organization: University of Virginia, Charlottesville
Lines: 139


It is difficult to produce a television documentary that is both
incisive and probing when every twelve minutes one is interrupted
by twelve dancing rabbits singing about toilet paper.

                                                  -- Rod Serling
__________________________________________________________________

WARNING: assembly language discussion ahead. . .

(Ah, I can hear even now the sound of a million fingers pressing
 the "N" key. . .)

I am writing a library of three-dimensional graphics routines
similar to the Graf3D package supplied by Apple.  However, I
desire more precision than these routines use.  I have chosen
to represent my angles as long integers, with the angle being
expressed in milli-arcseconds (with 360*3600*1000 milli-arcseconds
to a circle).  I am also using 64 bit integers (like the SANE
comp format) to represent my coordinates.

In the interest of speed, I desire to write a subroutine with
the following properties:

   1) on entry, D0 is an angle in milli-arcseconds
      (signed long integer)
   2) on exit, D0 is the cosine of the given angle
      (fract format, accurate to the last bit)
   3) a lookup table is allowable, but it should be no
      larger than 10KB, and preferably only a few KB

The past few months of research have yielded these thoughts:

Pretty obviously we can use the fact that the cosine function
is self-similar.  Thus the problem reduces to finding the
cosine of an angle between 0 and 90 degrees (or between 0 and
324,000,000 marcsec).  The cosines of other angles can be easily
derived if this sub-problem has been solved.

Although I expect to use a lookup table, linear interpolation
seems unfruitful.  To get thirty bits of accuracy (as stipulated
in Property 2) from linear interpolation would require many
thousands of table entries, even if the table density varied
over the range.  These table entries will likely be six bytes
each, in the form:

       b47 b46 . b45 b44 b43 b42 b41 . . . b3 b2 b1 b0
        |   |     |   |
        |   |     |   ---- weight 1/4
        |   |     -------- weight 1/2
        |   -------------- weight 1
        ------------------ weight -2 (sign bit)

Why this format?  If I want an accurate result in fract format,
I must do my scratch calculations in a higher precision and
truncate the final result.  The above format has the advantages
that: a) it's an even number of bytes, which fits nicely into
memory, b) to reduce the final result to fract format, all I
have to do is lop off the lower 16 bits.

Anyway, at six bytes per entry, this limits our table to about
a two thousand entries maximum.

Extending linear interpolation to the general case, we can use
the Gregory-Newton interpolation method, which goes:

     Given a table of values for a function f, with each
     table entry separated by a constant distance h, approximate
     f(x) by:

           f(x) = D0 + D1*t + D2*t*(t-1) + D3*t*(t-1)*(t-2) + ...
                              ----------   ----------------
                                  2!              3!

     where x0 is the nearest multiple of h below x, t = (x-x0)/h, and:

     D0 = f(x0)
     D1 = f(x0+h) - f(x0)
     D2 = f(x0+2h) - 2*f(x0+h) + f(x0)
     D3 = f(x0+3h) - 3*f(x0+2h) + 3*f(x0+h) - f(x0)
     (and so on, using Pascal's triangle)

Using any terms higher than the third, however, requires dividing
by three, which (you'll note) is not a power of 2.  Thus this
series should probably be limited to the first three terms for
fast use in machine language.  This method so far seems to be my most
hopeful route.

But why use a table at all?  For the sake of speed, remember.  I
could probably plug the given angle into some kind of polynomial,
but if I'm doing this on my own, then all the multiplications
will be multiple precision (because mulu.l returns a 64b result),
and I'll have to worry about multiplying a signed 64b number
by a 32b number, keeping track of the decimal point, accuracy
of the coefficients, etc.

I have already tried doing this, but my routine (even though it
wasn't quite finished) was only 2.3 times faster than converting
the angle to SANE extended (in radians) and calling FCOSX.  This
either says that my routine is surprisingly slow or that SANE
is surprisingly fast.  Of course, this was on a Mac II, but still...

So how do the professionals do it?  Good question.  FracCos and
FracSin are two routines which quickly calculate those two
trig functions, although they take their angles in radians (fixed
format) and their results are accurate to about 24 fraction bits,
not 30.  I've tried dissasembling these routines, but their method
eludes me.  (Trying to understand raw machine code is not a
pleasant experience...)

And again, I'm avoiding SANE and floating point numbers in general,
so there's no point in investigating how FCOSX works, although I would
be curious anyway.  SANE undoubtedly uses a polynomial of some kind
(Chebyshev, perhaps?) and does all of it's calculations in floating point.

By the way, FracCos runs about 9 times faster than FCOSX (assuming
equivalent angles) on a Mac II, and 20 times faster on a Mac SE.
Just thought you might like to know.

But is it even possible for me to beat FracCos?  I'm hoping so,
because it doesn't seem to use any lookup tables, and that
(generally) is a fast way of getting the values of functions.

Sources I have read: _The Art of Computing_ by Donald Knuth
                     _Microcomputer Animation_ (?) by Bruce Artwick
                     _Apple Numerics Manual_ by Apple themselves
                     _Numerical Recipes_ by Press et. al.
                     various obtuse textbooks on numerical analysis

And then of course, there's the rest of this damn project, but God
knows when I'll ever have the time. . .
______________________________________________________________________
"May the forces of evil become confused on the way to your house."

                                                         -- George Carlin
Sincerely,                            | DISCLAIMER:
   Colin Klipsch                      |   Every word in this text is
   University of Virginia             |   actually a horrendous misspelling
   (cak3g@astsun.astro.virginia.edu)  |   of the word "fettuccini".