Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!columbia!rutgers!sri-spam!sri-unix!hplabs!pyramid!octopus!harvax!garth!kissell
From: kissell@garth.UUCP (Kevin Kissell)
Newsgroups: net.arch
Subject: Re: Floating point performance & Mr. Mashey's Mythical Mhz
Message-ID: <377@garth.UUCP>
Date: Wed, 15-Oct-86 18:34:44 EDT
Article-I.D.: garth.377
Posted: Wed Oct 15 18:34:44 1986
Date-Received: Thu, 16-Oct-86 22:20:09 EDT
References: <340@euroies.UUCP> <1989@videovax.UUCP> <722@mips.UUCP>
Reply-To: kissell@garth.UUCP (Kevin Kissell)
Organization: Fairchild APD -- Palo Alto, CA
Lines: 74

Keywords:


In article <722@mips.UUCP> mash@mips.UUCP (John Mashey) writes:
>However, a useful attribute of Roger's measure's (or variant thereof)
>is that looking at the measure (units of real performance) per Mhz,
>you some idea of architectural efficiency, i.e., smaller numbers are
>better, in that (cycle time) is likely to be a property of the technology,
>and hard to improve, at a given level of technology. [This is clearly
>a RISC-style argument of reducing the cycle count for delivered performance,
>and then letting technology carry you forward.]  Using the numbers above,
>one gets KiloWhets / Mhz, for example:

I don't understand how someone of John's sophistication can insist on
repeating such a clearly fallacious argument.  The statement "cycle time
is likely to be a property of the technology" is simply untrue, as I have
pointed out in previous postings.  Cycle time is a the product of gate delays
(a property of technology) and the number of sequential gates between latches
(a property of architecture).  For example, let us consider two machines
that are familiar to John and myself and yet of interest to the newsgroup:
the MIPS R2000 and the Fairchild Clipper.  An 8 Mhz R2000 has a cycle time
of 125ns.  A 33Mhz Clipper has a cycle time of 30ns.  Yet both are built
with essentially the same 2-micron CMOS technology.  I somehow doubt that
Fairchild's CMOS transistors switch four times faster than that of whoever
is secretly building R2000s this week.  The difference is architectural.

As I understand it, the R2000 was designed to take advantage of delayed
load/branch techniques, and to execute instructions in a small number of
clocks, which in fact go hand-in-hand.  A load or branch can take as little
as two clocks.  But the addition of two numbers cannot take less than one
clock, and so the ALU has a leasurely 125ns to do something that it could
in principle have done more quickly, had it been more heavily pipelined.

The Clipper was designed from fairly well-established supercomputer and
mainframe techniques.  The cycle time is the time required to do the smallest
amount  of useful work - an integer ALU operation at 30ns.  Other instructions
must then of course be multiples of that basic unit.  Assuming cache hits,
a load takes 4/6 clocks (120/180ns vs 250ns for the R2000) and a branch takes
9 (270ns vs. 250ns for the R2000).

It should be noted that both machines allow for the overlapped execution
of instructions, but in different ways.  The R2000 overlaps register
operations with loads and branches using delay slots.  The Clipper
overlaps loads but not branches, using resource scoreboarding instead
of delay slots.  This means that the R2000 can branch more efficiently
(assuming the assembler can fill the delay slot), but the Clipper can
have more instructions executing concurrently than the R2000 (4 vs 2)
in in-line code.

Draw your own conclusions about "architectural efficiency".

>Machine	Mhz	KWhet	KWhet/Mhz
>80287		 8	 300	 40
>32332-32081	15	 728	 50		(these from Ray Curry,
>32332-32381	15	1200	 80		in <3833@nsc.UUCP>) (projected)
>32332-32310	15	1600	100*		"" "" (projected)
>Clipper?	33	1200?	 40		guess? anybody know better #?
>68881		12.5	 755	 60		(from discussion)
>68881		20	1240	 60		claimed by Moto, in SUN3-260
>SUN FPA	16.6	1700	100*		DP (from Hough) (in SUN3-160)
>MIPS R2360	 8	1160	140*		DP (interim, with restrictions)
>MIPS R2010	 8	4500	560		DP (simulated)

John's guess for the Clipper is off by over a factor of two.  The Clipper
FORTRAN compiler was brought up only recently.  In its present sane but
unoptimizing state, I obtained the following result on an Interpro 32C
running CLIX System V.3 at 33 Mhz (1 wait state), using a prototype Green
Hills Clipper FORTRAN compiler with Fairchild math libraries:

		Mhz	Kwhet	Kwhet/Mhz
Clipper		33	2920	Who cares?  Kwhet/Kg and Kwhet/cm2 are of
				more practical consequence.


Kevin D. Kissell
Fairchild Advanced Processor Division