Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!ucsd!swrinde!cs.utexas.edu!wuarchive!brutus.cs.uiuc.edu!apple!motcsd!xdos!doug
From: doug@xdos.UUCP (Doug Merritt)
Newsgroups: comp.graphics
Subject: Re: APPROXIMATING SQUARE_ROOT
Message-ID: <542@xdos.UUCP>
Date: 27 Nov 89 16:46:09 GMT
References: <1989Oct26.155201.5087@rpi.edu> <14397@well.UUCP> <367@oscsuna.osc.edu> <63127@psuecl.bitnet>
Reply-To: doug@xdos.UUCP (Doug Merritt)
Distribution: usa
Organization: Hunter Systems, Mountain View CA (Silicon Valley)
Lines: 39

In article <63127@psuecl.bitnet> c9h@psuecl.bitnet writes:
>Even with floating point numbers, and divide by 2 can be extremely fast
>if coded such that the exponent is simply decremented by one.  I would hope
>that many compilers do this optimization automatically, but I don't know.

This breaks when the number is not an exact power of two, because you're
not doing anything to the mantissa. So there'd better not be any compilers
that do this.

>On the other hand, if you want to take the square root of an integer, you
>could simply keep subtracting successive odd numbers until the next odd
>number is larger than your accumulator.  The number of odd numbers thus
>subtracted is the greatest integer function of the square root.  i.e.:

This is O(sqrt(N)), whereas e.g. binary search is O(log(N)). I would think
that this might be a deoptimization on machines with an efficient
multiply. Binary search requires an inner loop multiplication, but yours
requires lots more worst-case iterations.

For 16 bit integers, your algorithm's worst case is 255 iterations,
where binary search is 16. So the break-even point is a machine where
16-bit multiplication is 15 to 16 times slower than addition. (I've
misplaced my 68000 manual, but from memory I think the ratio of times for
addition versus multiplication make it approximately break-even for
that processor.)

For 32-bit integers, your worst case is 65535 iterations, which couldn't
possibly beat binary search's 32 iterations on any machine, even with
mildly inefficient software-synthesized multiplication.

But if your inputs are limited to say, 2^10 or less, I guess it could
be a win.

Clever algorithm for the right circumstances, but not a general case
optimization.
	Doug
-- 
Doug Merritt		{pyramid,apple}!xdos!doug
Member, Crusaders for a Better Tomorrow		Professional Wildeyed Visionary


Brought to you by Super Global Mega Corp .com