Path: utzoo!attcan!uunet!cs.utexas.edu!mailrus!ncar!asuvax!mcdphx!udc!chant!aglew
From: aglew@urbana.mcd.mot.com (Andy-Krazy-Glew)
Newsgroups: comp.arch
Subject: Re: 3010 fp (was linpack)
Message-ID: <AGLEW.89Oct27193246@chant.urbana.mcd.mot.com>
Date: 27 Oct 89 23:32:46 GMT
References: <36621@lll-winken.LLNL.GOV> <3300080@m.cs.uiuc.edu> <30100@obiwan.mips.COM> <34443@ames.arc.nasa.gov>
Sender: aglew@urbana.mcd.mot.com
Organization: Work: Motorola MCD, Urbana Design Center; School: University of Illinois at Urbana-Champaign
Lines: 54
In-reply-to: lamaster@ames.arc.nasa.gov's message of 25 Oct 89 20:40:02 GMT

>[Hugh LaMaster, commenting on MIPS' FP unit]: Fortunately, there are
>ideas good for more improvement up to about 1 million gates, based on
>Cray and CDC/ETA designs.

Can you provide any references or publications on these designs?
Hell - if anyone has a technical reference or gate layouts, I'd be
interested in seeing them...


>(Some of the CDC Cyber 205 models actually had fully segmented
>division, but it added a lot of extra real estate...)
>
>The first increment of improvement could come about by segmenting
>addition only and giving the multiply unit its own round/normalize
>capability.

What do you mean by "segmenting"?  Do you mean pipelining - eg. so that
divide doesn't need to use the same hardware over and over again for several
cycles?


By the way, can anyone provide details on Cyrix's 80387 superset floating
pont chip?  I have heard, for example, that it does IEEE extended (80 bit)
division in 4 cycles.  It uses quotient prediction of 17 bits, with a
17 by 69 bit multiplier array used in the iteration. 
    (Let's see, 17 summands is no more than 5 3:2 CSA levels -
actually fewer. That's fairly long as divider cycles go, according to
Fandrianto, but I suppose that it needs to be that long in order to
predict 17 bits)
    Does anyone know how they predict all 17 bits? Is it one level, or is
it several levels (the way Taylor got 8 bit prediction by doing 4 bit radix 
16 prediction twice)?

    
And in another FP question, I notice that the IBM America (RT-2?) has
a multiply-accumulate instruction that performs no intermediate
rounding.  Ie. it is ROUND(A*B+C).  This is "more accurate" but does
not necessarily produce the same answers as ROUND(ROUND(A*B)+C).
I wonder what the numerical analysis mavens have to say about this:
is it okay to get an answer more accurate than IEEE, or is this
something to be avoided?


--
Andy "Krazy" Glew,  Motorola MCD,    	    	    aglew@urbana.mcd.mot.com
1101 E. University, Urbana, IL 61801, USA.          {uunet!,}uiucuxc!udc!aglew
   
My opinions are my own; I indicate my company only so that the reader
may account for any possible bias I may have towards our products.