Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!sun-barr!ames!sgi!bron@bronze.wpd.sgi.com
From: bron@bronze.wpd.sgi.com (Bron Campbell Nelson)
Newsgroups: comp.arch
Subject: Re: MIPS/MFLOPS ratio [long; here we go again; sorry]
Summary: One data point
Message-ID: <37530@sgi.SGI.COM>
Date: 7 Jul 89 02:43:54 GMT
References: <596@megatek.UUCP> <112807@sun.Eng.Sun.COM> <114015@sun.Eng.Sun.COM>
Sender: daemon@sgi.SGI.COM
Organization: Silicon Graphics, Inc., Mountain View, CA
Lines: 55


In article <22792@winchester.mips.COM>, mash@mips.COM (John Mashey) writes:
[ A whole bunch of stuff, including: ]

> 3) It really is boring having to respond to marketing FUD and
> rewritings of history in comp.arch.  There are better things to do, and I'd much
> see discussion of things like (to pick a simple case):
> 	Which is better: 2-cycle + & 5-cycle *, or 3-cycle + & 4-cycle *?
> 	On which kinds of benchmarks? why?
> 	How much difference does it make in performance? in silicon space?
> 
> I.e., things that give DATA, and even better INSIGHT........
[...]
> -john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
> UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
> DDD:  	408-991-0253 or 408-720-1700, x253
> USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

Here is a single data point, drawn from Lawrence Livermore National Labs.
{Ref: Harry Nelson, "Using the Performance Monitors on the X-MP/48";
Tentacle; vol V, num. 9, Sept/Oct 1985 (LLNL internal publication).}
Result are reported for a 30 hour weekend full production run (i.e. almost
all batch jobs doing "real" work, very little interactivity).  Exactly which
programs were running is not known, but the author claims (based on several
similar experiments) that this is a representative sample.  Note by the
way that these were real jobs doing real work, not a set of benchmarks or
test programs.

During that time, the following operation counts were seen (1cpu):
	F.P. add:	1198 *10^9
	F.P. multiply:	1346 *10^9
	F.P. reciprocal: 135 *10^9
These numbers include by scalar and vector F.P. operations.  The multiply
numbers are slightly inflated due to lack of a F.P. divide operation on
a the X-MP; to do a full divide (i.e. A/B) requires one reciprocal and
three multiplies.  If we assume all the reciprocals represent divides,
and subtract these from the above we get
	+:	1198  => 53%
	*:	 941  => 41%
	/:	 135  =>  6%

The surprising thing (to me) is how close the + and * numbers are.  What
this unfortunately means is that the answer is not very clear.  It involves
answering questions like "how frequently can an add be overlapped with
a multiply?", and "how often is an add on the critical path?"  F.P. adds
are not so abundant (relative to multiplies) that the question can be
dismissed, but it is certainly not something I'd recommend without a lot
of supporting evidence, and even then its looks to be a fairly marginal
optimization.  The silicon might be better invested in doing something else
(maybe hardware sqrt?).

--
Bron Campbell Nelson
bron@sgi.com  or possibly  ..!ames!sgi!bron
These statements are my own, not those of Silicon Graphics.