Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!caip!rutgers!husc6!panda!genrad!decvax!decwrl!labrea!glacier!mips!mash
From: mash@mips.UUCP
Newsgroups: net.arch
Subject: Re: Re: Floating point performance & Mr. Mashey's Mythical Mhz
Message-ID: <727@mips.UUCP>
Date: Sun, 19-Oct-86 04:19:32 EDT
Article-I.D.: mips.727
Posted: Sun Oct 19 04:19:32 1986
Date-Received: Tue, 21-Oct-86 21:32:10 EDT
References: <340@euroies.UUCP> <1989@videovax.UUCP> <722@mips.UUCP> <377@garth.UUCP>
Reply-To: mash@mips.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 139

In article <377@garth.UUCP> kissell@garth.UUCP (Kevin Kissell) writes:
>In article <722@mips.UUCP> mash@mips.UUCP (John Mashey) writes:

>...that are familiar to John and myself and yet of interest to the newsgroup:
>the MIPS R2000 and the Fairchild Clipper.  An 8 Mhz R2000 has a cycle time
>of 125ns.  A 33Mhz Clipper has a cycle time of 30ns.  Yet both are built
>with essentially the same 2-micron CMOS technology.  I somehow doubt that
>Fairchild's CMOS transistors switch four times faster than that of whoever
>is secretly building R2000s this week.  The difference is architectural.

(One of my colleagues got here first, hansen@mips, in 726@mips.UUCP,
so I'll just add a few notes where they don't overlap too much.)

There was no intent in the original posting to start a MIPS versus 
Clipper war [contrary to John Gilmore's posting in <1198@hoptoad.uucp>:
sorry John, another Moto versus Intel battle we do not need, fun though
it may be to watch!]  I was only trying to be reasonably inclusive of 
relevant 32-bit micros.  However, now that the issue has been raised.....

An 8Mhz R2000 isn't pushing the technology very hard, ON PURPOSE!!!
8Mhz parts appear first, followed by 12s and 16s, for the same reasons you got
12Mhz 68020s before 16s and 25s. Also, I'm told that the 2u design doesn't
push 2u technology as hard as it might have, in order to let the same
design be shrunk to 1.5u and 1.2u with minimal effort.

Now, the reason one might care about MWhets/MHz (or any similar measure
that compares the delivered real performance with some basic technology
speed) is to understand the margin and headroom in a design.
Since Kevin brought the issue up, some hypothetical questions:
	a) Will there be 66Mhz Clippers in 2u CMOS?
		[To get actual performance like 16Mhz R2000 in 2u;]
		[If the answer is yes, I know a bunch of people, not all at
		MIPS, either, who have some real tough questions involving
		transmission-line effects, how to do ECL or other reduced-
		voltage-swing I/O, etc.]
	b) If they will be, what year will they be?
		[1987?]
	c) When will there be bigger / (more in parallel) CAMMU chips?
		[Because if there aren't, how are the caches going to
		get enough bigger to keep the delivered performance in line
		with the CPU clock speed improvements? (for real programs)?
		Chips gets faster with shrinks, but they don't magically
		get re-laid-out to acquire more memory.  CAMMU chips have
		some good ideas in them, but they're not very big, especially
		compared with the needs of some of the real programs that
		people would like to run on high-performance micros. (There
		is some real nasty stuff lurking out there!  People keep
		putting them on our machines, so we know....If the Clipper
		FORTRAN compilers just came up recently, and they haven't
		yet tried running 500KLOC FORTRAN programs...interesting
		times are ahead....)
>
>The Clipper was designed from fairly well-established supercomputer and
>mainframe techniques....

"fairly well-established supercomputer and mainframe techniques"
is interesting.  I can think of 2 ways to read this assertion:
	a) High-performance VLSI designs should be done just like big
	machines.
OR
	b) High-performance VLSI should be designed with good understanding
	of big machines, as well as good understanding of the tradeoffs
	necessary for VLSI [margin, headroom, packaging constraints, processes,
	etc, etc], where those are different from the design tradeoffs of
	the big ECL boxes.
I hope Kevin meant b), which most people would agree with.
>
>John's guess for the Clipper is off by over a factor of two.  The Clipper

Thanks for the info: all I'd seen were random guesses from people around
the net, and it's a useful contribution to see numbers from somebody
that knows.  Hopefully, we'll see more?  [I assume that was DP?]

>FORTRAN compiler was brought up only recently.  In its present sane but
>unoptimizing state, I obtained the following result on an Interpro 32C
>running CLIX System V.3 at 33 Mhz (1 wait state), using a prototype Green
>Hills Clipper FORTRAN compiler with Fairchild math libraries:
>
>		Mhz	Kwhet	Kwhet/Mhz
>Clipper		33	2920	Who cares? Kwhet/Kg and Kwhet/cm2 are of
>				more practical consequence.

As hansen@mips noted, these are reasonable results, and I'd assume they'll
improve somewhat with more mature compiler technology.

Actually, this raises a set of questions that might be of general interest
in this newsgroup, basically:
1) What metrics are interesting?
2) How do you define them?
3) In what problem domains are they relevant?
4) What are different constraints that people use?
5) How do different metrics correlate, specifically, are some of the simpler
(easier-to-measure) good predictors of the more complex ones?

For example, here are some metrics, all of which have appeared in this
newsgroup at some time or other.  Proposals are solicited:

a) Clock rate. (Mhz) --
b) Peak Mips [i.e., typically back-to-back cached, register-register adds]. --
c) Sustained Mips ?
d) Benchmark performance relative to other computers  ++
e) Peak Mflops [i.e., "" "" for FP] --
f) Dhrystones
g) Whetstones +
h) LINPACK MFLops ++
i) Kwhets / Mflops [g/e] -
j) Kwhets / Mhz [g/a] +
k) Kg
l) cm2 (or cm3)
m) Watts
n) $$ +++
o) Kwhets / Kg [g/k]
p) Kwhets / cm2 [g/l] +
q) Kwhets / Watt [g/m] +
r) (any of the above) / $$ +++(esp if d))
---------
(-- & ++ indicate general impression of these metrics)

What's interesting is that people have all sorts of different constraint
combinations or optimization functions over any of these.  Let me try
a few examples, and solicit some more:
1) Maximize g), h) etc, subject to few constraints, i.e., for people who
	buy CRAYs, etc, money is (almost( no object.
2) Maximize one of the performance numbers, subject to some constraint.
	The constraint might be:
	absolute cm2 or cm3, as in some avionics things, i.e., if it
	doesn't fit, it doesn't matter how fast it is!
	$$: get me the most for some fixed amount of money, and I don't
	care if it's 2X faster, even if it's more cost-effective.
3) Performance may not be particularly important at all, relative to
object-code compatbility, software availability, service, etc.

Comments? What sorts of metrics are important to the people who read
this newsgroup? What kinds of constraints?  How do you buy machines?
If you buy CPU chips, how do you decide what to pick?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086