Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!ucsd!swrinde!cs.utexas.edu!wuarchive!brutus.cs.uiuc.edu!apple!motcsd!xdos!doug From: doug@xdos.UUCP (Doug Merritt) Newsgroups: comp.graphics Subject: Re: APPROXIMATING SQUARE_ROOT Message-ID: <542@xdos.UUCP> Date: 27 Nov 89 16:46:09 GMT References: <1989Oct26.155201.5087@rpi.edu> <14397@well.UUCP> <367@oscsuna.osc.edu> <63127@psuecl.bitnet> Reply-To: doug@xdos.UUCP (Doug Merritt) Distribution: usa Organization: Hunter Systems, Mountain View CA (Silicon Valley) Lines: 39 In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes: >Even with floating point numbers, and divide by 2 can be extremely fast >if coded such that the exponent is simply decremented by one. I would hope >that many compilers do this optimization automatically, but I don't know. This breaks when the number is not an exact power of two, because you're not doing anything to the mantissa. So there'd better not be any compilers that do this. >On the other hand, if you want to take the square root of an integer, you >could simply keep subtracting successive odd numbers until the next odd >number is larger than your accumulator. The number of odd numbers thus >subtracted is the greatest integer function of the square root. i.e.: This is O(sqrt(N)), whereas e.g. binary search is O(log(N)). I would think that this might be a deoptimization on machines with an efficient multiply. Binary search requires an inner loop multiplication, but yours requires lots more worst-case iterations. For 16 bit integers, your algorithm's worst case is 255 iterations, where binary search is 16. So the break-even point is a machine where 16-bit multiplication is 15 to 16 times slower than addition. (I've misplaced my 68000 manual, but from memory I think the ratio of times for addition versus multiplication make it approximately break-even for that processor.) For 32-bit integers, your worst case is 65535 iterations, which couldn't possibly beat binary search's 32 iterations on any machine, even with mildly inefficient software-synthesized multiplication. But if your inputs are limited to say, 2^10 or less, I guess it could be a win. Clever algorithm for the right circumstances, but not a general case optimization. Doug -- Doug Merritt {pyramid,apple}!xdos!doug Member, Crusaders for a Better Tomorrow Professional Wildeyed Visionary Brought to you by Super Global Mega Corp .com