Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!apple!vsi1!wyse!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Info on DSP chips
Message-ID: <24516@winchester.mips.COM>
Date: 2 Aug 89 05:06:31 GMT
References: <337@venus.iotek.UUCP> <23379@winchester.mips.COM> <277@melair.UUCP> <3469@epimass.EPI.COM> <344@venus.iotek.UUCP>
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Inc.
Lines: 45

In article <344@venus.iotek.UUCP> garyb@venus.UUCP (Gary Burrell) writes:
>In article <3469@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
....
>>Unless you took account of a bug in the C30 simulator, your number is
>>a bit too optimistic: it always takes two cycles to write to external
>>memory, even with zero wait states; the C30 simulator counts it as one.
>>To get the true time, add a cycle for each external memory write cycle.

>	This is one reason why I was questioning the original results
>in the afterword of DSP micro Dec 88.  They were comparing estimated
>(not even simulated) data to real world benchmarks on super computers
>and comming up with some amazing results.  (est 20 MFLOPS Single Prec.
>Linpack for the TMS320C30).

>	IMHO one should not compare estimated, simulated and real data
>as estimation and simulation often err on the side of optimism.

It is often necessary to compare such things, in order to figure out
whether something is worth building or not.  I do think that it is very
important to:
	a) Precisely label every such number as measured, simulated, or
	estimated, and if so, with what memory configuration, i.e., 
	to be convincing that something is reasonably buildable.
	b) Precisely label what kind of MFLOPs you're talking about.
	FFTs are not FORTRAN DP 100x100 LINPACK MFLOPs, for example.
Note that to get anything close to the peak rates on LINPACK,
you probably:
	a) Have a vector machine, including a 3-pipe memory system.
OR
	b) A scalar machine, with minimal-latency caches big enough
	to hold the array for LINPACK, and the cache pre-loaded
	with all of the data, and a cache structure that doesn't
	end up generating more misses, and that doesn't conflict
	with the different array sizes (201, etc) of which the 100x100
	is a subarray.

AND
	appropriate optimizing compilers

Few micros are a) or b) .......
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086