Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rochester!crowl
From: crowl@rochester.ARPA (Lawrence Crowl)
Newsgroups: net.arch
Subject: Re: Floating point performance & Mr. Mashey's Mythical Mhz
Message-ID: <21944@rochester.ARPA>
Date: Mon, 27-Oct-86 15:18:05 EST
Article-I.D.: rocheste.21944
Posted: Mon Oct 27 15:18:05 1986
Date-Received: Mon, 27-Oct-86 22:36:19 EST
References: <340@euroies.UUCP> <1989@videovax.UUCP> <722@mips.UUCP> <377@garth.UUCP> <727@mips.UUCP> <103@unc.unc.UUCP>
Reply-To: crowl@rochtest.UUCP (Lawrence Crowl)
Organization: U of Rochester, CS Dept, Rochester, NY
Lines: 42

>>In article <727@mips.UUCP> mash@mips.UUCP (John Mashey) writes:
>> Now, the reason one might care about MWhets/MHz (or any similar measure
>> that compares the delivered real performance with some basic technology
>> speed) is to understand the margin and headroom in a design.

>In article <103@unc.unc.UUCP> rentsch@unc.UUCP (Tim Rentsch) writes:
> There is a subtle pitfall in arguing that FLOPS/HZ (or IPS/HZ) is a measure
> of architectural "goodness".  Certainly, measuring FLOPS/HZ is a reasonable
> attempt to factor out the particulars of the device fabrication, which are
> obviously irrelevant to architecture.  ...  BUT -- and here is the pitfall
> -- it just might be that given identical fabrication methods, the better
> FLOPS/HZ choice would still run slower because it would not support
> the higher clock rate.  

Perhaps what we are missing is that for a given level of technology, a longer
clock cycle allows us to have a larger depth of combinational circuitry.  That
is, we can have each clock work through more gates.  So, a 4 MHz clock which
governs propogation through a combinational circuit 4 gates deep will do
roughly the same work as a 1 MHz clock governing propogation through a
combinational circuit 16 gates deep.  Perhaps a better measure is the depth of
gates required to implement a FLOP, (or an instruction, or a window, etc.).

The very fast clock, heavily pipelined machines like the Cray and Clipper
follow the first approach, while the slower clock, less pipelined machines
like the Berkley RISC and MIPS follow the second approach.  Which is better is
probably dependent upon the technology used to implement the architecture and
the desired speed.  For instance, if we want a very fast vector processor, we
should probably choose the fast clock, more pipelined architecture.  If we want
a better price/performance ratio, we should probably choose the slow clock,
less pipelined architecture.

BOLD UNSUPPORTED CLAIM: The "best" architecture is technology dependent.  The
quality of an architecture is dependent on the technology used to implement it,
and no architecture is "best" under more than a limited range of technologies.
For instance, under technologies in which the bandwidth to memory is most
limited, stack architectures (Burroughs, Lilith) will be "better".  Under 
technologies where the ability to process instructions is most limited, the
wide register to register architectures will be "better".
-- 
  Lawrence Crowl		716-275-5766	University of Rochester
			crowl@rochester.arpa	Computer Science Department
 ...!{allegra,decvax,seismo}!rochester!crowl	Rochester, New York,  14627