Path: utzoo!attcan!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!uakari.primate.wisc.edu!uflorida!haven!uvaarpa!hudson!astsun8.astro.Virginia.EDU!cak3g From: cak3g@astsun8.astro.Virginia.EDU (Colin Klipsch) Newsgroups: comp.sys.mac.programmer Subject: How to Cosine Message-ID: <2157@hudson.acc.virginia.edu> Date: 23 Oct 89 04:20:43 GMT Sender: news@hudson.acc.virginia.edu Reply-To: cak3g@astsun8.astro.Virginia.EDU (Colin Klipsch) Organization: University of Virginia, Charlottesville Lines: 139 It is difficult to produce a television documentary that is both incisive and probing when every twelve minutes one is interrupted by twelve dancing rabbits singing about toilet paper. -- Rod Serling __________________________________________________________________ WARNING: assembly language discussion ahead. . . (Ah, I can hear even now the sound of a million fingers pressing the "N" key. . .) I am writing a library of three-dimensional graphics routines similar to the Graf3D package supplied by Apple. However, I desire more precision than these routines use. I have chosen to represent my angles as long integers, with the angle being expressed in milli-arcseconds (with 360*3600*1000 milli-arcseconds to a circle). I am also using 64 bit integers (like the SANE comp format) to represent my coordinates. In the interest of speed, I desire to write a subroutine with the following properties: 1) on entry, D0 is an angle in milli-arcseconds (signed long integer) 2) on exit, D0 is the cosine of the given angle (fract format, accurate to the last bit) 3) a lookup table is allowable, but it should be no larger than 10KB, and preferably only a few KB The past few months of research have yielded these thoughts: Pretty obviously we can use the fact that the cosine function is self-similar. Thus the problem reduces to finding the cosine of an angle between 0 and 90 degrees (or between 0 and 324,000,000 marcsec). The cosines of other angles can be easily derived if this sub-problem has been solved. Although I expect to use a lookup table, linear interpolation seems unfruitful. To get thirty bits of accuracy (as stipulated in Property 2) from linear interpolation would require many thousands of table entries, even if the table density varied over the range. These table entries will likely be six bytes each, in the form: b47 b46 . b45 b44 b43 b42 b41 . . . b3 b2 b1 b0 | | | | | | | ---- weight 1/4 | | -------- weight 1/2 | -------------- weight 1 ------------------ weight -2 (sign bit) Why this format? If I want an accurate result in fract format, I must do my scratch calculations in a higher precision and truncate the final result. The above format has the advantages that: a) it's an even number of bytes, which fits nicely into memory, b) to reduce the final result to fract format, all I have to do is lop off the lower 16 bits. Anyway, at six bytes per entry, this limits our table to about a two thousand entries maximum. Extending linear interpolation to the general case, we can use the Gregory-Newton interpolation method, which goes: Given a table of values for a function f, with each table entry separated by a constant distance h, approximate f(x) by: f(x) = D0 + D1*t + D2*t*(t-1) + D3*t*(t-1)*(t-2) + ... ---------- ---------------- 2! 3! where x0 is the nearest multiple of h below x, t = (x-x0)/h, and: D0 = f(x0) D1 = f(x0+h) - f(x0) D2 = f(x0+2h) - 2*f(x0+h) + f(x0) D3 = f(x0+3h) - 3*f(x0+2h) + 3*f(x0+h) - f(x0) (and so on, using Pascal's triangle) Using any terms higher than the third, however, requires dividing by three, which (you'll note) is not a power of 2. Thus this series should probably be limited to the first three terms for fast use in machine language. This method so far seems to be my most hopeful route. But why use a table at all? For the sake of speed, remember. I could probably plug the given angle into some kind of polynomial, but if I'm doing this on my own, then all the multiplications will be multiple precision (because mulu.l returns a 64b result), and I'll have to worry about multiplying a signed 64b number by a 32b number, keeping track of the decimal point, accuracy of the coefficients, etc. I have already tried doing this, but my routine (even though it wasn't quite finished) was only 2.3 times faster than converting the angle to SANE extended (in radians) and calling FCOSX. This either says that my routine is surprisingly slow or that SANE is surprisingly fast. Of course, this was on a Mac II, but still... So how do the professionals do it? Good question. FracCos and FracSin are two routines which quickly calculate those two trig functions, although they take their angles in radians (fixed format) and their results are accurate to about 24 fraction bits, not 30. I've tried dissasembling these routines, but their method eludes me. (Trying to understand raw machine code is not a pleasant experience...) And again, I'm avoiding SANE and floating point numbers in general, so there's no point in investigating how FCOSX works, although I would be curious anyway. SANE undoubtedly uses a polynomial of some kind (Chebyshev, perhaps?) and does all of it's calculations in floating point. By the way, FracCos runs about 9 times faster than FCOSX (assuming equivalent angles) on a Mac II, and 20 times faster on a Mac SE. Just thought you might like to know. But is it even possible for me to beat FracCos? I'm hoping so, because it doesn't seem to use any lookup tables, and that (generally) is a fast way of getting the values of functions. Sources I have read: _The Art of Computing_ by Donald Knuth _Microcomputer Animation_ (?) by Bruce Artwick _Apple Numerics Manual_ by Apple themselves _Numerical Recipes_ by Press et. al. various obtuse textbooks on numerical analysis And then of course, there's the rest of this damn project, but God knows when I'll ever have the time. . . ______________________________________________________________________ "May the forces of evil become confused on the way to your house." -- George Carlin Sincerely, | DISCLAIMER: Colin Klipsch | Every word in this text is University of Virginia | actually a horrendous misspelling (cak3g@astsun.astro.virginia.edu) | of the word "fettuccini".