Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!caip!think!rutgers!sri-spam!sri-unix!hplabs!oliveb!glacier!mips!mash From: mash@mips.UUCP (John Mashey) Newsgroups: net.arch Subject: Re: Floating point performance Message-ID: <722@mips.UUCP> Date: Mon, 13-Oct-86 05:21:47 EDT Article-I.D.: mips.722 Posted: Mon Oct 13 05:21:47 1986 Date-Received: Tue, 14-Oct-86 06:25:24 EDT References: <340@euroies.UUCP> <1989@videovax.UUCP> Reply-To: mash@mips.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 88 In article <1989@videovax.UUCP> stever@videovax.UUCP (Steven E. Rice) writes: >In article <340@euroies.UUCP>, Roger Shepherd (shepherd@euroies.UUCP) >writes: >> ..... For example, I have single length Whetstone >> figures as follows for these machines >> kWhets MWhets/MFLOP >> (normalised) >> Intel 80286/80287 (8 Mhz) 300 3.2 1.0 >> NS 30032 & 32081 (10 Mhz) 128 1.3 0.4 >> MC 68020 & 68881 (16 & 12.5) 755 2.5 0.8 >> >> Inmos IMS T414B-20 663 7.4 2.3 >> >> The final column gives some feel for how effective these >> proce)/co-processor (just processor for the T414) >> combinations are at turning MFLOPS into usable floating >> point performance. > >As one who has looked at the relative merits of various processors >and coprocessors before making a selection, I am not at all concerned >about "how effective [a] processor/co-processor combination[] [is] at >turning MFLOPS into usable floating point performance." The bottom >line for an application is closely tied to the numbers in the "kWhets" >column. The real question is, "How fast will it run my application?" --THE RIGHT QUESTION!-----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note that this discussion is very akin to the "peak Mips" versus "sustained Mips" versus "how fast does it run real programs" argument in the integer side of the world. I think both Roger and Steven have some useful points, and, in fact, don't seem to be to disagree very much: 1) (Roger): MFLOPS don't mean very much. (see (1) below, etc) 2) (Steven): and neither do Whetstones! 3) (Roger): propose Whetstones / (peak MFLOPS) as architectural measure. Note that most vendors spec MFLOPS using cached, back-to-back adds with both arguments already in registers. For real programs, one also needs to measure effects of: a) coprocessor interaction, i.e., can you load/store directly to the coprocessor from memory, or do you need to copy arguments thru the CPU? (can make large difference). b) Pipelining/overlap effects? c) Number of FP registers. d) Compiler effects. (1)In general, peak MFlops don't seem to mean too much. Whetstones seem to test the FP libraries more than anything else (although this at least measures SOMETHING a bit more real). (2) A lot of people like LINPACK MFLops ratings, or Livermore Loops, although the former, at least, also measures memory system very strongly, i.e., its bigger than almost any cache, and that's quite characteristic of some codes, and totally uncharacteristic of others. (3) However, a useful attribute of Roger's measure's (or variant thereof) is that looking at the measure (units of real performance) per Mhz, you some idea of architectural efficiency, i.e., smaller numbers are better, in that (cycle time) is likely to be a property of the technology, and hard to improve, at a given level of technology. [This is clearly a RISC-style argument of reducing the cycle count for delivered performance, andthen letting technology carry you forward.] Using the numbers above, one gets KiloWhets / Mhz, for example: Machine Mhz KWhet KWhet/Mhz 80287 8 300 40 32332-32081 15 728 50 (these from Ray Curry, 32332-32381 15 1200 80 in <3833@nsc.UUCP>) (projected) 32332-32310 15 1600 100* "" "" (projected) Clipper? 33 1200? 40 guess? anybody know better #? 68881 12.5 755 60 (from discussion) 68881 20 1240 60 claimed by Moto, in SUN3-260 SUN FPA 16.6 1700 100* DP (from Hough) (in SUN3-160) MIPS R2360 8 1160 140* DP (interim, with restrictions) MIPS R2010 8 4500 560 DP (simulated) The *'d ones are boards / controllers for Weitek parts. The Kwhet/Mhz numbers were heavily rounded: 1-2 digits accuracy is about all you can extract from this, at best. One can argue about the speed that should be used for the 68881 systems, since the associated 68020 runs faster. What you do see is (not surprisingly) that heavily microcoded designs get less Kwhet/Mhz than those that use either the Weitek parts or are not microcoded. As usual, whether you think this means anything or not depends on whether or not you think Whetstones are a good measure. If not, it would help to see other things proposed. For some reason, Floating Point benchmarks seem to vary pretty strongly in their behavioral patterns. Also, if anybody has better numbers, it would be nice to see them. At least some of the ones in the list above are of uncertain parentage. -- -john mashey DISCLAIMER: UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086