Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!apple!vsi1!wyse!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Info on DSP chips Message-ID: <24516@winchester.mips.COM> Date: 2 Aug 89 05:06:31 GMT References: <337@venus.iotek.UUCP> <23379@winchester.mips.COM> <277@melair.UUCP> <3469@epimass.EPI.COM> <344@venus.iotek.UUCP> Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 45 In article <344@venus.iotek.UUCP> garyb@venus.UUCP (Gary Burrell) writes: >In article <3469@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes: .... >>Unless you took account of a bug in the C30 simulator, your number is >>a bit too optimistic: it always takes two cycles to write to external >>memory, even with zero wait states; the C30 simulator counts it as one. >>To get the true time, add a cycle for each external memory write cycle. > This is one reason why I was questioning the original results >in the afterword of DSP micro Dec 88. They were comparing estimated >(not even simulated) data to real world benchmarks on super computers >and comming up with some amazing results. (est 20 MFLOPS Single Prec. >Linpack for the TMS320C30). > IMHO one should not compare estimated, simulated and real data >as estimation and simulation often err on the side of optimism. It is often necessary to compare such things, in order to figure out whether something is worth building or not. I do think that it is very important to: a) Precisely label every such number as measured, simulated, or estimated, and if so, with what memory configuration, i.e., to be convincing that something is reasonably buildable. b) Precisely label what kind of MFLOPs you're talking about. FFTs are not FORTRAN DP 100x100 LINPACK MFLOPs, for example. Note that to get anything close to the peak rates on LINPACK, you probably: a) Have a vector machine, including a 3-pipe memory system. OR b) A scalar machine, with minimal-latency caches big enough to hold the array for LINPACK, and the cache pre-loaded with all of the data, and a cache structure that doesn't end up generating more misses, and that doesn't conflict with the different array sizes (201, etc) of which the 100x100 is a subarray. AND appropriate optimizing compilers Few micros are a) or b) ....... -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086