Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!sun-barr!ames!sgi!bron@bronze.wpd.sgi.com From: bron@bronze.wpd.sgi.com (Bron Campbell Nelson) Newsgroups: comp.arch Subject: Re: MIPS/MFLOPS ratio [long; here we go again; sorry] Summary: One data point Message-ID: <37530@sgi.SGI.COM> Date: 7 Jul 89 02:43:54 GMT References: <596@megatek.UUCP> <112807@sun.Eng.Sun.COM> <114015@sun.Eng.Sun.COM> Sender: daemon@sgi.SGI.COM Organization: Silicon Graphics, Inc., Mountain View, CA Lines: 55 In article <22792@winchester.mips.COM>, mash@mips.COM (John Mashey) writes: [ A whole bunch of stuff, including: ] > 3) It really is boring having to respond to marketing FUD and > rewritings of history in comp.arch. There are better things to do, and I'd much > see discussion of things like (to pick a simple case): > Which is better: 2-cycle + & 5-cycle *, or 3-cycle + & 4-cycle *? > On which kinds of benchmarks? why? > How much difference does it make in performance? in silicon space? > > I.e., things that give DATA, and even better INSIGHT........ [...] > -john mashey DISCLAIMER: > UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com > DDD: 408-991-0253 or 408-720-1700, x253 > USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086 Here is a single data point, drawn from Lawrence Livermore National Labs. {Ref: Harry Nelson, "Using the Performance Monitors on the X-MP/48"; Tentacle; vol V, num. 9, Sept/Oct 1985 (LLNL internal publication).} Result are reported for a 30 hour weekend full production run (i.e. almost all batch jobs doing "real" work, very little interactivity). Exactly which programs were running is not known, but the author claims (based on several similar experiments) that this is a representative sample. Note by the way that these were real jobs doing real work, not a set of benchmarks or test programs. During that time, the following operation counts were seen (1cpu): F.P. add: 1198 *10^9 F.P. multiply: 1346 *10^9 F.P. reciprocal: 135 *10^9 These numbers include by scalar and vector F.P. operations. The multiply numbers are slightly inflated due to lack of a F.P. divide operation on a the X-MP; to do a full divide (i.e. A/B) requires one reciprocal and three multiplies. If we assume all the reciprocals represent divides, and subtract these from the above we get +: 1198 => 53% *: 941 => 41% /: 135 => 6% The surprising thing (to me) is how close the + and * numbers are. What this unfortunately means is that the answer is not very clear. It involves answering questions like "how frequently can an add be overlapped with a multiply?", and "how often is an add on the critical path?" F.P. adds are not so abundant (relative to multiplies) that the question can be dismissed, but it is certainly not something I'd recommend without a lot of supporting evidence, and even then its looks to be a fairly marginal optimization. The silicon might be better invested in doing something else (maybe hardware sqrt?). -- Bron Campbell Nelson bron@sgi.com or possibly ..!ames!sgi!bron These statements are my own, not those of Silicon Graphics.