Path: utzoo!attcan!uunet!dino!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!aglew
From: aglew@oberon.csg.uiuc.edu (Andy Glew)
Newsgroups: comp.arch
Subject: Re: 68040
Message-ID: <AGLEW.90Feb7180138@oberon.csg.uiuc.edu>
Date: 8 Feb 90 00:01:38 GMT
References: <851@trane.UUCP> <7793@quick.COM> <5120@crdgw1.crd.ge.com>
	<1224@m3.mfci.UUCP>
Sender: news@ux1.cso.uiuc.edu (News)
Organization: University of Illinois, Computer Systems Group
Lines: 119
In-Reply-To: colwell@mfci.UUCP's message of 7 Feb 90 02:13:15 GMT

>>srg@quick (Spencer Garrett) writes:
>>I think I've seen array circuit designs that did mul, div and sqrt. 
>>I think I remember that once you've built an array divider,
>>it's not hard to make it do square-root as well.
>
>An array divider?  Are you sure?  That's a new one on me.  Who
>does division this way?  I know about array multipliers, and folks 
>who do division based on Newton Raphson approximation (and sqrts too,
>for that matter.)  And there are parts like Weitek's and BIT's that 
>generate quotients and roots via iterative methods like radix-4 
>non-restoring algorithms.  For those, sqrt is exactly twice as hard, 
>because you get only one bit of root per iteration, where division
>yields two.
>
>I did see a paper about two years ago in IEEE Trans. on Computers
>that showed a way to get two sqrt bits per iteration, but I
>recall thinking it would be hard to implement it (then).
>
>Bob Colwell               ..!uunet!mfci!colwell


There are a number of designs for array dividers. (I'll hunt up papers
if you want.  A good selection of references should be in the upcoming
IEEE Transactions on Computers special issue on Computer Arithmetic).

Most array dividers, however, are iterative.  The time through the array
is O(n) (and, of course, the area is O(n^2)).
    Full array multipliers come in the iterative O(n) time variety,
where all the additions propagate linearly across the array.  Much
more interesting are array multipliers with embedded trees to suim the
partial products: 3:2 Wallace trees or 2:1 trees, usually using a
redundant form.  These get O(log n) time performance out of the array.
This sort of thing seems to be the state of the art - M88K, AMD's latest.
(Amusing side note: all the old VLSI papers used to say that Wallace trees
were too hard to lay out, too irregular.  A circuit designer said to me
<<Obviously, nobody who said that ever tried to lay one out. They're easy.>>)
    I have not seen any array dividers with better than O(n) time.
It seems that the division algorithms are inherently "Choose some bits;
Form partial quotient and remnainder; Repeat", ie. the linearity seems inherent
in everything except Newton-Raphson or multiplicative approaches. 


Higher radix square root - quite a few papers report 2 bits/cycle.
Fandrianto has reported 3 (and implied 4). Cyrix, with their 17
bit/cycle divide, could likely use NR to get mongo bits for sqrt - I'm
not sure if they did.

References:

%A Kuninnobu
%A Nisiyama
%A Edamatsu
%A Taniguchi
%A Takagi
%T Design of High Speed MOS Multiplier and Divider
Using Redundant Binary Representation
%J ARITH8
%D 1987
%P 80-86
%X Uses redundant {-1,0,1} arithmetic to obtain 2:1 adders;
pre-manipulation to use only n/4 terms.
    I know these guys used an array multiplier. They may have used
an array divider - I'm not sure.


The following papers describe higher radix sqrt algorithms.

%A Bruce Gene De\0Lugish
%T A Class of Algorithms for Automatic Evaluation
of Certain Elementary Functions in a Binary Computer
%J University of Illinois PhD thesis
%D June 1, 1970
%C Urbana-Champaign
%K Lugish
%X Normalization digit selection algorithms for 
multiply, divide, ln, exp,
sqrt, tan, cotan, sin, cos, arctan, arcsin, arccos.
Additive and multiplicative digit based algorithms.
Suggested hardware structure.

%A Catherine Yuk-Fun Chow
%T A Variable Precision Processor Module
%J University of Illinois PhD thesis
%D July 1980
%C Urbana-Champaign
%X Predecessor to Carter's work.

%A Jan Fandrianto
%T Algorithm for High-Speed Shared Radix 4 Division
and Radix 4 Square Root
%K radix4
%J ARITH8
%D 1987
%P 73-79
%X Probably the paper Bob Colwell found "hard to do".

%A Fandrianto
%T Algorithm for High Speed Shared Radix 8 Division and Radix 8 Square Root
%J ARITH9
%K radix8
%P 68-75
%X Doesn't cascade like Taylor's radix 16 = radix 4 squared.
But could be considered a radix 2 cascaded with a radix 4 implementation,
thus avoiding the need for a 3x generation.

%A Montuschi
%A Ciminiera
%T On the Efficient Implementation of Higher Radix Square Root Algorithms
%J ARITH9
%P 154-161
%X Uses 7 bits to select radix 4 digit.

%A Ercegovac
%A Lang
%T Radix 4 Square Root Without Initial PLA
%J ARITH9
%P 162-168
--
Andy Glew, aglew@uiuc.edu