Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!agate!bionet!ames!ncar!boulder!sunybcs!rutgers!cmcl2!adm!smoke!gwyn From: gwyn@smoke.BRL.MIL (Doug Gwyn ) Newsgroups: comp.lang.c Subject: Re: Wanted: fast, low-precision trig functions for C Message-ID: <9743@smoke.BRL.MIL> Date: 28 Feb 89 03:40:40 GMT References: <1256@dukeac.UUCP> Reply-To: gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 51 In article <1256@dukeac.UUCP> tcamp@dukeac.UUCP (Ted A. Campbell) writes: >I am writing a graphics-based program that has to calculate >several thousand map coordinates. ... >I wondered if there were any fast, low-precision approaches >to the standard trig functions, ... The cost of the "extra precision" over a less precise but otherwise similar approximation is typically small. However, there may be hope -- I know of two circumstances under which one can substantially speed up such functions: (1) If all coordinates will be taken from a grid, as when dealing with graphic display devices, then several methods are available for quickly returning answers that are as precise as the grid resolution allows. Unfortunately the ones I know of are implemented as proprietary code. If you have access to sources for graphic devices, e.g. DMD, check there for fast approximate trig functions. (2) For more or less arbitrary floating-point coordinates, if there are only a manageable number of distinct function arguments, then you may be able to "cache" them, either precomputed or computed the first time a value is needed, stashing the value for immediate use the next time the same argument is supplied. For this to work well, there must be a fast method of mapping the argument into the correct cache entry. (a) If the arguments are all regularly spaced, for example integral number of degrees, then the cache can be an array of return values indexed by the integer number of spacing intervals. E.g. double cosval[360] = { 1.0, 0.999848, ... }; (b) If the arguments are arbitrary within some domain, but the function is nicely-behaved, then you can obtain good enough accuracy by converting the argument to some number of intervals, as above, rounding to the nearest integral number, and proceed as above. For example, the slope of cos() is no worse than +-1, so for 0.1% accuracy it would suffice to divide the range [0,Pi/2] into on the order of a thousand bins. Once you start thinking along these lines, you can probably spot several areas for code speed improvement. The key is that the cost to access the cache must be less than the cost of whatever approximation method the standard library is using. If your machine has a "polynomial evaluate" instruction, it might be hard to beat the standard library.