Path: utzoo!attcan!uunet!cs.utexas.edu!mailrus!ncar!asuvax!mcdphx!udc!chant!aglew From: aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) Newsgroups: comp.arch Subject: Re: 3010 fp (was linpack) Message-ID: Date: 27 Oct 89 23:32:46 GMT References: <36621@lll-winken.LLNL.GOV> <3300080@m.cs.uiuc.edu> <30100@obiwan.mips.COM> <34443@ames.arc.nasa.gov> Sender: aglew@urbana.mcd.mot.com Organization: Work: Motorola MCD, Urbana Design Center; School: University of Illinois at Urbana-Champaign Lines: 54 In-reply-to: lamaster@ames.arc.nasa.gov's message of 25 Oct 89 20:40:02 GMT >[Hugh LaMaster, commenting on MIPS' FP unit]: Fortunately, there are >ideas good for more improvement up to about 1 million gates, based on >Cray and CDC/ETA designs. Can you provide any references or publications on these designs? Hell - if anyone has a technical reference or gate layouts, I'd be interested in seeing them... >(Some of the CDC Cyber 205 models actually had fully segmented >division, but it added a lot of extra real estate...) > >The first increment of improvement could come about by segmenting >addition only and giving the multiply unit its own round/normalize >capability. What do you mean by "segmenting"? Do you mean pipelining - eg. so that divide doesn't need to use the same hardware over and over again for several cycles? By the way, can anyone provide details on Cyrix's 80387 superset floating pont chip? I have heard, for example, that it does IEEE extended (80 bit) division in 4 cycles. It uses quotient prediction of 17 bits, with a 17 by 69 bit multiplier array used in the iteration. (Let's see, 17 summands is no more than 5 3:2 CSA levels - actually fewer. That's fairly long as divider cycles go, according to Fandrianto, but I suppose that it needs to be that long in order to predict 17 bits) Does anyone know how they predict all 17 bits? Is it one level, or is it several levels (the way Taylor got 8 bit prediction by doing 4 bit radix 16 prediction twice)? And in another FP question, I notice that the IBM America (RT-2?) has a multiply-accumulate instruction that performs no intermediate rounding. Ie. it is ROUND(A*B+C). This is "more accurate" but does not necessarily produce the same answers as ROUND(ROUND(A*B)+C). I wonder what the numerical analysis mavens have to say about this: is it okay to get an answer more accurate than IEEE, or is this something to be avoided? -- Andy "Krazy" Glew, Motorola MCD, aglew@urbana.mcd.mot.com 1101 E. University, Urbana, IL 61801, USA. {uunet!,}uiucuxc!udc!aglew My opinions are my own; I indicate my company only so that the reader may account for any possible bias I may have towards our products.