Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uwm.edu!zaphod.mps.ohio-state.edu!mips!winchester!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: RISC vs CISC simple load benchmark; amazing ! [Not really]
Message-ID: <39319@mips.mips.COM>
Date: 12 Jun 90 03:35:06 GMT
References: <8019@mirsa.inria.fr>
Sender: news@mips.COM
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems
Lines: 132

In article <8019@mirsa.inria.fr> jlf@mirsa.inria.fr (Jean-Louis Faraut) writes:
>Therefore, we are looking at new RISC technology and how big CPU
>manufacturers announced performances are is a matter of wondering for
>me :-?
>I try it with different commands to test CPU, I/O etc ... but for the
>sake of brevity, I'll only present here results obtained from a
>slightly modified version of the famous Bill Joy's test program, where
>RISC are supposed to be better than CISC .
>
>Here is my version of the Joy's program :
>========================================
>#!/bin/sh
>echo 99k65536vvvvvp8op | dc 
>========================================

Well, unfortunately, there is un unfortunate bug in this benchmark,
in that the behavior of this program in no way resembles most code
typically run on general-purpose UNIX sytems, and you absolutely do NOT
want to use it to help choose computers unless your workload happens to
be multiple-precisions arithmetic doing lots of ulitplies and divides.

Specifically, most RISC designers, after studying many programs, decided
that integer multiply (and especially divide) were used less frequently
than many other operations, and there is substantial data that backs this
up from many vendors.  RISC designers, depending on the benchmarks used,
and amount of silicon available, allocated various amounts of silicon to
support these operations, from zero up. The SPARC designers included
a Multiply-Step, and no Divide-Step (i.e., divides are done by fair-sized
hunk of code); HP PA included M-S and D-S; MIPS & 88K included both
integer mult & divide in hardware, etc.  However, for example, a typical
integer divide on a MIPS takes about 35 cycles.... and probably about
the same on a typical CISC.

IF YOU WANT TO PROVE THAT A RISC IS NOT VERY MUCH BETTER THAN A CISC,
OR EVEN WORSE, AT CPU PERFORMANCE:

Use a program consisting of integer divides of 2 variables that the
optimizers can't get rid of.
	a) This will show a MIPS or 88K at their least advantage.
	b) This will prove that a SPARC is the slowest thing in existence
	(well almost).  While we (MIPS) thought divide was good to have,\
	I must defend the SPARCers as not being irrational in leaving it
	out, given the constraints of the first implementation.
Attached to the end of this is a brief sample of the MIPS prof/pixstats,
which shows that on a MIPS-based machine (i.e., including DEC 5810),
multiply/divide accounts for 1/3 of the cycles.  This is NOT as bad
as the even more classic "2^4096" dc benchmark, but it's still very high.
(I tried email to jlf, but it bounced for some reason)
jlf: if you want some solid CPU (single-user) data on realistic
benchmarks, call the MIPS Paris office (33-1-42-04-0311) and ask for
"SPEC Data Helps Normalize Vendor Mips-Ratings for Sensible Comparison
OR
Your Mileage May Vary, But If Your Car Were A Computer, It Would Vary More"
Issue 2.0, May 1990.  This is a giant analysis of all the published SPECmark
data, and includes plenty of RISCs & CISCs.....  IT doesn't tell you about
multi-user stuff...

PROF & PIXSTATS: stop here if you don't want gory details.
Profile listing generated Mon Jun 11 19:49:24 1990 with:
   prof -pixie dc 
*  -p[rocedures] using basic-block counts;                                 *
*  sorted in descending order by the number of cycles executed in each     *
*  procedure; unexecuted procedures are excluded                           *

18830106 cycles **INSTRUCTION CYCLES, NO CACHE MISSES, STALLS**

    cycles %cycles  cum %     cycles  bytes procedure (file)
                               /call  /line

  15732582   83.55  83.55      53695     37 div (dc.c)
   1359698    7.22  90.77       5938     36 mult (dc.c)
    342478    1.82  92.59       3142     32 getdec (dc.c)
    307083    1.63  94.22         32     20 seekc (dc.c)
    294895    1.57  95.79       3597     43 add (dc.c)
**ALL REMAINING FUNCTIONS USED LESS THAN 1% OF THE INSTRUCTION CYCLES**
*  -h[eavy] using basic-block counts;                                      *
*  sorted in descending order by the number of cycles executed in each     *
*  line; unexecuted lines are excluded                                     *

procedure (file)                           line bytes     cycles      %  cum %

div (dc.c)                                  665   144    3260586  17.32  17.32
div (dc.c)                                  657   124    2781234  14.77  32.09
div (dc.c)                                  671    92    1911378  10.15  42.24
div (dc.c)                                  655    72    1648096   8.75  50.99
div (dc.c)                                  664    52    1124340   5.97  56.96
div (dc.c)                                  672    40     805894   4.28  61.24
div (dc.c)                                  658    40     739898   3.93  65.17
div (dc.c)                                  656    16     412024   2.19  67.36
div (dc.c)                                  667    12     337302   1.79  69.15
div (dc.c)                                  659   116     245129   1.30  70.45
mult (dc.c)                                1097   100     233586   1.24  71.69
**ALL REMAINING STATEMENTS USED LESS THAN 1% OF THE INSTRUCTION CYCLES.  
pixstats dc:
  27600801 (1.466) cycles (1.1s @ 25.0MHz) **INSTR CYCLES, INCL /* STALLS
  18830106 (1.000) instructions ** INSTRUCTIONS **
   6807625 (0.362) loads
   1994843 (0.106) stores
   8802468 (0.467) loads+stores
   8802468 (0.467) data bus use
   8770695 (0.466) multiply/divide interlock cycles (12/35 cycles)
**EXCEPTIONALLY HIGH: VERY FEW REAL PROGRAMS LOOK THIS WAY: 1/3 cycles
	IS WAITING FOR MUL OR DIV TO COMPLETE! **

0.448 load nops per load
**ALSO VERY HIGH: more typical is .25-.30 lnops/load**

opcode distribution:
      lw    6350450   33.72%
    lnop    3051497   16.21%
      sw    1686506    8.96%
    bnop    1237018    6.57%
     bne    1162741    6.17%
    addu     893980    4.75%
   addiu     799847    4.25%
    beqz     719618    3.82%
      lb     449888    2.39%
      li     326381    1.73%
      sb     307835    1.63%
    sltu     303554    1.61%
    subu     256783    1.36%
    mflo     238028    1.26%
     div     238028    1.26%	**YEP, RIGHT UP THERE**
   bcond     139925    0.74%
    bgez     129581    0.69%
    mfhi     114251    0.61%
   multu     114251    0.61%	**AND THERE'S THE MULTIPLY**
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086