Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rochester!crowl From: crowl@rochester.ARPA (Lawrence Crowl) Newsgroups: net.arch Subject: Re: Floating point performance & Mr. Mashey's Mythical Mhz Message-ID: <21944@rochester.ARPA> Date: Mon, 27-Oct-86 15:18:05 EST Article-I.D.: rocheste.21944 Posted: Mon Oct 27 15:18:05 1986 Date-Received: Mon, 27-Oct-86 22:36:19 EST References: <340@euroies.UUCP> <1989@videovax.UUCP> <722@mips.UUCP> <377@garth.UUCP> <727@mips.UUCP> <103@unc.unc.UUCP> Reply-To: crowl@rochtest.UUCP (Lawrence Crowl) Organization: U of Rochester, CS Dept, Rochester, NY Lines: 42 >>In article <727@mips.UUCP> mash@mips.UUCP (John Mashey) writes: >> Now, the reason one might care about MWhets/MHz (or any similar measure >> that compares the delivered real performance with some basic technology >> speed) is to understand the margin and headroom in a design. >In article <103@unc.unc.UUCP> rentsch@unc.UUCP (Tim Rentsch) writes: > There is a subtle pitfall in arguing that FLOPS/HZ (or IPS/HZ) is a measure > of architectural "goodness". Certainly, measuring FLOPS/HZ is a reasonable > attempt to factor out the particulars of the device fabrication, which are > obviously irrelevant to architecture. ... BUT -- and here is the pitfall > -- it just might be that given identical fabrication methods, the better > FLOPS/HZ choice would still run slower because it would not support > the higher clock rate. Perhaps what we are missing is that for a given level of technology, a longer clock cycle allows us to have a larger depth of combinational circuitry. That is, we can have each clock work through more gates. So, a 4 MHz clock which governs propogation through a combinational circuit 4 gates deep will do roughly the same work as a 1 MHz clock governing propogation through a combinational circuit 16 gates deep. Perhaps a better measure is the depth of gates required to implement a FLOP, (or an instruction, or a window, etc.). The very fast clock, heavily pipelined machines like the Cray and Clipper follow the first approach, while the slower clock, less pipelined machines like the Berkley RISC and MIPS follow the second approach. Which is better is probably dependent upon the technology used to implement the architecture and the desired speed. For instance, if we want a very fast vector processor, we should probably choose the fast clock, more pipelined architecture. If we want a better price/performance ratio, we should probably choose the slow clock, less pipelined architecture. BOLD UNSUPPORTED CLAIM: The "best" architecture is technology dependent. The quality of an architecture is dependent on the technology used to implement it, and no architecture is "best" under more than a limited range of technologies. For instance, under technologies in which the bandwidth to memory is most limited, stack architectures (Burroughs, Lilith) will be "better". Under technologies where the ability to process instructions is most limited, the wide register to register architectures will be "better". -- Lawrence Crowl 716-275-5766 University of Rochester crowl@rochester.arpa Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627