Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!columbia!rutgers!sri-spam!sri-unix!hplabs!pyramid!octopus!harvax!garth!kissell From: kissell@garth.UUCP (Kevin Kissell) Newsgroups: net.arch Subject: Re: Floating point performance & Mr. Mashey's Mythical Mhz Message-ID: <377@garth.UUCP> Date: Wed, 15-Oct-86 18:34:44 EDT Article-I.D.: garth.377 Posted: Wed Oct 15 18:34:44 1986 Date-Received: Thu, 16-Oct-86 22:20:09 EDT References: <340@euroies.UUCP> <1989@videovax.UUCP> <722@mips.UUCP> Reply-To: kissell@garth.UUCP (Kevin Kissell) Organization: Fairchild APD -- Palo Alto, CA Lines: 74 Keywords: In article <722@mips.UUCP> mash@mips.UUCP (John Mashey) writes: >However, a useful attribute of Roger's measure's (or variant thereof) >is that looking at the measure (units of real performance) per Mhz, >you some idea of architectural efficiency, i.e., smaller numbers are >better, in that (cycle time) is likely to be a property of the technology, >and hard to improve, at a given level of technology. [This is clearly >a RISC-style argument of reducing the cycle count for delivered performance, >and then letting technology carry you forward.] Using the numbers above, >one gets KiloWhets / Mhz, for example: I don't understand how someone of John's sophistication can insist on repeating such a clearly fallacious argument. The statement "cycle time is likely to be a property of the technology" is simply untrue, as I have pointed out in previous postings. Cycle time is a the product of gate delays (a property of technology) and the number of sequential gates between latches (a property of architecture). For example, let us consider two machines that are familiar to John and myself and yet of interest to the newsgroup: the MIPS R2000 and the Fairchild Clipper. An 8 Mhz R2000 has a cycle time of 125ns. A 33Mhz Clipper has a cycle time of 30ns. Yet both are built with essentially the same 2-micron CMOS technology. I somehow doubt that Fairchild's CMOS transistors switch four times faster than that of whoever is secretly building R2000s this week. The difference is architectural. As I understand it, the R2000 was designed to take advantage of delayed load/branch techniques, and to execute instructions in a small number of clocks, which in fact go hand-in-hand. A load or branch can take as little as two clocks. But the addition of two numbers cannot take less than one clock, and so the ALU has a leasurely 125ns to do something that it could in principle have done more quickly, had it been more heavily pipelined. The Clipper was designed from fairly well-established supercomputer and mainframe techniques. The cycle time is the time required to do the smallest amount of useful work - an integer ALU operation at 30ns. Other instructions must then of course be multiples of that basic unit. Assuming cache hits, a load takes 4/6 clocks (120/180ns vs 250ns for the R2000) and a branch takes 9 (270ns vs. 250ns for the R2000). It should be noted that both machines allow for the overlapped execution of instructions, but in different ways. The R2000 overlaps register operations with loads and branches using delay slots. The Clipper overlaps loads but not branches, using resource scoreboarding instead of delay slots. This means that the R2000 can branch more efficiently (assuming the assembler can fill the delay slot), but the Clipper can have more instructions executing concurrently than the R2000 (4 vs 2) in in-line code. Draw your own conclusions about "architectural efficiency". >Machine Mhz KWhet KWhet/Mhz >80287 8 300 40 >32332-32081 15 728 50 (these from Ray Curry, >32332-32381 15 1200 80 in <3833@nsc.UUCP>) (projected) >32332-32310 15 1600 100* "" "" (projected) >Clipper? 33 1200? 40 guess? anybody know better #? >68881 12.5 755 60 (from discussion) >68881 20 1240 60 claimed by Moto, in SUN3-260 >SUN FPA 16.6 1700 100* DP (from Hough) (in SUN3-160) >MIPS R2360 8 1160 140* DP (interim, with restrictions) >MIPS R2010 8 4500 560 DP (simulated) John's guess for the Clipper is off by over a factor of two. The Clipper FORTRAN compiler was brought up only recently. In its present sane but unoptimizing state, I obtained the following result on an Interpro 32C running CLIX System V.3 at 33 Mhz (1 wait state), using a prototype Green Hills Clipper FORTRAN compiler with Fairchild math libraries: Mhz Kwhet Kwhet/Mhz Clipper 33 2920 Who cares? Kwhet/Kg and Kwhet/cm2 are of more practical consequence. Kevin D. Kissell Fairchild Advanced Processor Division