Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!samsung!usc!zaphod.mps.ohio-state.edu!mips!mash
From: mash@mips.com (John Mashey)
Newsgroups: comp.arch
Subject: Re: Novice question:  measuring speed
Message-ID: <1060@spim.mips.COM>
Date: 15 Mar 91 21:43:50 GMT
References: <645@ssdc?> <3516:Mar1319:50:3291@kramden.acf.nyu.edu>
Sender: news@mips.COM
Organization: MIPS Computer Systems, Inc.
Lines: 51
Nntp-Posting-Host: winchester.mips.com

In article <3516:Mar1319:50:3291@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>In contrast, MFLOPS measure some (supposedly) real amount of work
>getting done. The number of floating-point operations in a typical
>computation is relatively independent of the machine at hand. Of course,
>MFLOPS don't tell you whether floating-point divisions are ridiculously
>slow, and they don't tell you how non-floating-point computations will
>run, but they're at least a bit more solid than MIPS.

No they're not ....
Read Hennessy & Patterson, page 43-44.

The only reason why MFLOPS might sometimes mean something is that
if they're (fully-qualified, i.e., FORTRAN, 64-bit) LINPACK MFLOPS,
then you are actually talking about performance as measured on
a specific benchmark, which is rather different than talking
about MFLOPs in

As has been discussed numerous times in the past:
	vendor-published mips-ratings are essentailly meaningless.
	no single number captures the performance differences
	among machines.
	Dhrystone-vax-mips almost always over-predict the performance of
	modern machiens relative to a VAX-11/780, compared to their
	performance on realistic programs.


So far, the single-mips-related measure that is available for a wide
number of machines, and has some consistency and predictive value,
in my opinion, is the SPEC-integer subset, because:
	the 4 programs are enough larger than Dhrystone and such to
	avoid silly cache effects.  (They're still not quite big enough,
	perhaps).
	they're real programs, and hard to compiler-gimmick.
	they're fairly consistent, i.e., the VAX-relative variance is
	fairly low.
	(The above are reasonably verifiable.  In addition ,they correlate
	reasonable well with some larger, more stressful, but proprietrary
	benchmarks that lots of use inside computer companies.)
The SPEC FP subset is also useful, although the benchmark-to-benchmark
variance is much higher, and hence you need to do more of your own benchmarking
to figure out which of yours calibrate with them.  (This is the nature of
FP programs, in general.)

In general, Chapter 2 of Hennessy & patterson is good to read.

-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086