Path: utzoo!attcan!uunet!dino!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!aglew From: aglew@oberon.csg.uiuc.edu (Andy Glew) Newsgroups: comp.arch Subject: Re: 68040 Message-ID: Date: 8 Feb 90 00:01:38 GMT References: <851@trane.UUCP> <7793@quick.COM> <5120@crdgw1.crd.ge.com> <1224@m3.mfci.UUCP> Sender: news@ux1.cso.uiuc.edu (News) Organization: University of Illinois, Computer Systems Group Lines: 119 In-Reply-To: colwell@mfci.UUCP's message of 7 Feb 90 02:13:15 GMT >>srg@quick (Spencer Garrett) writes: >>I think I've seen array circuit designs that did mul, div and sqrt. >>I think I remember that once you've built an array divider, >>it's not hard to make it do square-root as well. > >An array divider? Are you sure? That's a new one on me. Who >does division this way? I know about array multipliers, and folks >who do division based on Newton Raphson approximation (and sqrts too, >for that matter.) And there are parts like Weitek's and BIT's that >generate quotients and roots via iterative methods like radix-4 >non-restoring algorithms. For those, sqrt is exactly twice as hard, >because you get only one bit of root per iteration, where division >yields two. > >I did see a paper about two years ago in IEEE Trans. on Computers >that showed a way to get two sqrt bits per iteration, but I >recall thinking it would be hard to implement it (then). > >Bob Colwell ..!uunet!mfci!colwell There are a number of designs for array dividers. (I'll hunt up papers if you want. A good selection of references should be in the upcoming IEEE Transactions on Computers special issue on Computer Arithmetic). Most array dividers, however, are iterative. The time through the array is O(n) (and, of course, the area is O(n^2)). Full array multipliers come in the iterative O(n) time variety, where all the additions propagate linearly across the array. Much more interesting are array multipliers with embedded trees to suim the partial products: 3:2 Wallace trees or 2:1 trees, usually using a redundant form. These get O(log n) time performance out of the array. This sort of thing seems to be the state of the art - M88K, AMD's latest. (Amusing side note: all the old VLSI papers used to say that Wallace trees were too hard to lay out, too irregular. A circuit designer said to me <>) I have not seen any array dividers with better than O(n) time. It seems that the division algorithms are inherently "Choose some bits; Form partial quotient and remnainder; Repeat", ie. the linearity seems inherent in everything except Newton-Raphson or multiplicative approaches. Higher radix square root - quite a few papers report 2 bits/cycle. Fandrianto has reported 3 (and implied 4). Cyrix, with their 17 bit/cycle divide, could likely use NR to get mongo bits for sqrt - I'm not sure if they did. References: %A Kuninnobu %A Nisiyama %A Edamatsu %A Taniguchi %A Takagi %T Design of High Speed MOS Multiplier and Divider Using Redundant Binary Representation %J ARITH8 %D 1987 %P 80-86 %X Uses redundant {-1,0,1} arithmetic to obtain 2:1 adders; pre-manipulation to use only n/4 terms. I know these guys used an array multiplier. They may have used an array divider - I'm not sure. The following papers describe higher radix sqrt algorithms. %A Bruce Gene De\0Lugish %T A Class of Algorithms for Automatic Evaluation of Certain Elementary Functions in a Binary Computer %J University of Illinois PhD thesis %D June 1, 1970 %C Urbana-Champaign %K Lugish %X Normalization digit selection algorithms for multiply, divide, ln, exp, sqrt, tan, cotan, sin, cos, arctan, arcsin, arccos. Additive and multiplicative digit based algorithms. Suggested hardware structure. %A Catherine Yuk-Fun Chow %T A Variable Precision Processor Module %J University of Illinois PhD thesis %D July 1980 %C Urbana-Champaign %X Predecessor to Carter's work. %A Jan Fandrianto %T Algorithm for High-Speed Shared Radix 4 Division and Radix 4 Square Root %K radix4 %J ARITH8 %D 1987 %P 73-79 %X Probably the paper Bob Colwell found "hard to do". %A Fandrianto %T Algorithm for High Speed Shared Radix 8 Division and Radix 8 Square Root %J ARITH9 %K radix8 %P 68-75 %X Doesn't cascade like Taylor's radix 16 = radix 4 squared. But could be considered a radix 2 cascaded with a radix 4 implementation, thus avoiding the need for a 3x generation. %A Montuschi %A Ciminiera %T On the Efficient Implementation of Higher Radix Square Root Algorithms %J ARITH9 %P 154-161 %X Uses 7 bits to select radix 4 digit. %A Ercegovac %A Lang %T Radix 4 Square Root Without Initial PLA %J ARITH9 %P 162-168 -- Andy Glew, aglew@uiuc.edu