Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!ubc-cs!eric!handel!holm From: holm@handel.mpr.ca (Terrence W. Holm) Newsgroups: comp.arch Subject: Re: John von Neumann, sqrt instr (a DSP56000 example) Message-ID: <1762@eric.mpr.ca> Date: 23 Aug 89 22:37:32 GMT References: <21353@cup.portal.com> <25643@obiwan.mips.COM> <1513@l.cc.purdue.edu> <2376@wyse.wyse.com> <3312@blake.acs.washington.edu> <12640@pur-ee.UUCP> <911@bnr-rsc.UUCP> Sender: news@eric.mpr.ca Reply-To: holm@mprgate.mpr.ca (Terrence W. Holm) Organization: Microtel Pacific Research Ltd., Burnaby, B.C., Canada Lines: 87 In article <911@bnr-rsc.UUCP> drago@bnr-rsc.UUCP (Drago Goricanec) writes: >>In article <3312@blake.acs.washington.edu> (Laurence Yaffe) writes: >>>5) I'm not a big user of square roots. As long as square roots cost >>> less than 100 multiplies, I don't think I'd trade faster square roots >>> for slower performance in other ops. > >Here's a method that computes the square root directly (ie. this is not >an approximation method). I don't remember what it's called, but I >originally saw the method done on decimal numbers. The algorithm is >much simpler on binary numbers. It resembles long division on paper. > >I'll present it in 68030 code, then describe it. > ... >Drago Goricanec (...!bnr-vpa!bnr-rsc!drago) >Bell-Northern Research Ltd., Ottawa, Ontario (613) 763-8957 The algorithm looked interesting to me, so I rewrote it for the Motorola DSP56000. It takes 188 cycles (7 microseconds). [A multiply on this processor takes 2 cycles.] The actual code is below, if anybody cares (should be in comp.dsp). ----------------------------------------------------------------- opt cc ROUND equ 0 ; Set to 0 to enable truncated results ; Set to 1 to enable rounding results org p:$100 ; sqrt() - Square root of an integer ; [Not for fixed point fractional numbers] ; ; Author: Terrence W. Holm, 1989-August-23 ; ; ; Entry: ; a1 = 24 bit unsigned integer ; m0 = $FFFF ; m1 = $FFFF ; ; Exit: ; a1 = Truncated or rounded square root of (a1) ; ; Destroys: ; a2, a1, a0 ; b2, b1, b0 ; x1, x0 ; r1, r0 ; n1, n0 ; ; Cycles: 188, (6.96 us @ 27MHz) for truncated results ; 208, (7.70 us @ 27MHz) for rounded results ; [From original: d0 is not necessary, d1 = b0, d2 = b1, d3 = r0] sqrt ; a1 = number clr b #>1,x1 ; b1 will be left_part(number) - (r0/2)**2 move b,r0 ; r0 will be sqrt(number) * 2 move a1,b0 ; b0 = number [shifts left into b1] move #<2,n1 do #12+ROUND,end ; 2 bits at a time for 24 bits in total asl b r0,n0 asl b ; b1 = left_part(number) tfr b,a (r0)+n0 ; a1/a0 is copy of b1/b0 ; r0 = 2 * r0 sub x1,a r0,r1 move r0,x0 sub x0,a (r1)+n1 ; a1 = b1 - r0 - 1 ; r1 = r0 + 2 tpl a,b r1,r0 ; if ( b1 >= r0 + 1 ) then [d2>d3] ; b1 = b1 - r0 - 1 ; r0 = r0 + 2 end move r0,a asr a ; a1 = r0/2 = sqrt(number) if ROUND==1 move #<2,a0 ; Force round up not convergent rounding asr a rnd a ; Round up endif ----------------------------------------------------------------- Terrence W. Holm holm@mprgate.mpr.ca