Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!rex!ames!vsi1!wyse!mips!wright.mips.com
From: earl@wright.mips.com (Earl Killian)
Newsgroups: comp.sys.m88k
Subject: Re: Information wanted on m88000 Risc workstations
Message-ID: <34446@mips.mips.COM>
Date: 13 Jan 90 00:18:25 GMT
References: <641@s5.Morgan.COM> <25A64468.11498@paris.ics.uci.edu> <648@s5.Morgan.COM> <1879@xyzzy.UUCP> <TOM.90Jan9101628@hcx2.ssd.csd.harris.com> <2811@yogi.oakhill.UUCP>
Sender: news@mips.COM
Reply-To: earl@wright.mips.com (Earl Killian)
Organization: MIPS Computer Systems Inc.
Lines: 26
In-reply-to: marvin@oakhill.UUCP (Marvin Denman)

In article <2811@yogi.oakhill.UUCP>, marvin@oakhill (Marvin Denman) writes:
>I disagree.  I think that unless the latency is very short (2 or
>maybe 3 cycles) that pipelining will pay off on a normal application
>mix.  The longer the latency, the more likely it is that you will
>want to unroll or reschedule code.  It will be interesting to see if
>MIPS goes to pipelining floating point instructions in future parts.

Pipelining makes less than 1% difference on the non-vector
applications that I've looked at.  Even on vector applications it is
unimportant if your latencies are short enough.  2 or 3 cycles adds
are doable, for example.  Consider the application being discussed,
matrix multiply, which is highly vectorizable.  If the original poster
is correct in that the 88100, with its pipelined floating-point units,
tops out in 6.7 mflop/s in single precision matrix multiplies, it
really proves this point.  The MIPS R3000, with non-pipelined
floating-point units, can do matrix multiplies at
			   25MHz	   33MHz
	single		11.8 mflop/s	15.7 mflop/s
	double		 7.8 mflop/s	10.4 mflop/s
This an example of why MIPS perfers low-latency to pipelined fp.
--
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086
-- 
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086