Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!rex!ames!vsi1!wyse!mips!wright.mips.com From: earl@wright.mips.com (Earl Killian) Newsgroups: comp.sys.m88k Subject: Re: Information wanted on m88000 Risc workstations Message-ID: <34446@mips.mips.COM> Date: 13 Jan 90 00:18:25 GMT References: <641@s5.Morgan.COM> <25A64468.11498@paris.ics.uci.edu> <648@s5.Morgan.COM> <1879@xyzzy.UUCP> <2811@yogi.oakhill.UUCP> Sender: news@mips.COM Reply-To: earl@wright.mips.com (Earl Killian) Organization: MIPS Computer Systems Inc. Lines: 26 In-reply-to: marvin@oakhill.UUCP (Marvin Denman) In article <2811@yogi.oakhill.UUCP>, marvin@oakhill (Marvin Denman) writes: >I disagree. I think that unless the latency is very short (2 or >maybe 3 cycles) that pipelining will pay off on a normal application >mix. The longer the latency, the more likely it is that you will >want to unroll or reschedule code. It will be interesting to see if >MIPS goes to pipelining floating point instructions in future parts. Pipelining makes less than 1% difference on the non-vector applications that I've looked at. Even on vector applications it is unimportant if your latencies are short enough. 2 or 3 cycles adds are doable, for example. Consider the application being discussed, matrix multiply, which is highly vectorizable. If the original poster is correct in that the 88100, with its pipelined floating-point units, tops out in 6.7 mflop/s in single precision matrix multiplies, it really proves this point. The MIPS R3000, with non-pipelined floating-point units, can do matrix multiplies at 25MHz 33MHz single 11.8 mflop/s 15.7 mflop/s double 7.8 mflop/s 10.4 mflop/s This an example of why MIPS perfers low-latency to pipelined fp. -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086 -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086