Path: utzoo!attcan!uunet!oddjob!uwvax!rutgers!ucsd!ames!claris!apple!voder!pyramid!prls!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: NSIEVE C Source File (long)
Message-ID: <2899@winchester.mips.COM>
Date: 25 Aug 88 01:50:51 GMT
References: <1070@marlin.NOSC.MIL> <1073@marlin.NOSC.MIL>
Reply-To: mash@winchester.UUCP (John Mashey)
Distribution: comp.arch
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 161

In article <1073@marlin.NOSC.MIL> aburto@marlin.nosc.mil.UUCP (Alfred A. Aburto) writes:

>/****************** Start NSIEVE C Source Code *************************/

It is a good thing to publish benchmark sources so people can check themr,
and Alfred should commended for doing so.
Note, however, that this is not the most meaningful benchmark.
See gory details below, including architectural analysis.

>/************************************************************************/
>/* Some Results:                                                        */
>/* Array Size   ------------------ BenchTime (sec) -------------------- */
>/*  (Bytes)     SUN 3/280   VAX 8600     Turbo-Amiga                    */
>/*            68020 @25 MHz           68020 @ 14.32 MHz                 */
>/*              ( cc -O )   ( cc -O )  ( cc +2 +L +ff )                 */ 
>/*      8191        0.267      0.250         0.461  ( 0.264 )           */
>/*     10000        0.300      0.283         0.578  ( 0.331 )           */
>/*     20000        0.650      0.783         1.195  ( 0.684 )           */
>/*     40000        1.333      1.767         2.383  ( 1.365 )           */
>/*     80000        2.917      3.750         4.820  ( 2.761 )           */
>/*    160000        7.833      8.033         9.758  ( 5.589 )           */
>/*    320000       17.600     17.717        ------     |                */
>/*                                                     |                */
>/*                Scaled to 25 MHz --------------------+                */

>/*                                                                      */
>/* Average Run Time relative to 10 Iterations and 8191 array size:      */
>/*                  0.315      0.345         0.484                      */

A. Output from MIPS R2000
	(in M/120: 16.7MHz, 128K cache, low-latency memory system)

   Sieve of Eratosthenes (scaled to 10 Iterations)

   Array Size   Primes   Last Prime   BenchTime
    (Bytes)     Found                   (Sec)
      8191        1899        16381       0.130
     10000        2261        19997       0.150
     20000        4202        39989       0.320
     40000        7836        79999       0.630
     80000       14683       160001       1.270
    160000       27607       319993       2.580
    320000       52073       639997       5.570

   Relative to 10 Iterations and the 8191 Array Size:
   Average BenchTime =    0.131 (sec)

Now, some more data on this, or WHY YOU MUST BE CAREFUL WITH SMALL BENCHMARKS!

B. First, architectural data:
One can see (look for the (#) codes in the data below.
	(1) The instructions fit in a 32-word I-cache.
	(2) Essentially 100% of loads and stores are bytes.
		A more typical number might be 10%.
	(3) 75% of the loads+stores are stores.  A more typical number
		is 30-35%.
	(4) 100% of the loads had NO useful work schedulable in the
	load-delay slot, i.e., in this case, a load is actually costing
	2 cycles (+ any cache miss time).  A more typical number if 30%
	(and this is very germane to using this benchmark on RISC machines
	with pipelining).
	(5) ~100% of the loads/stores have zero-offsets; a more typical
	number is 10-15%.  [relevant to 29K, for example]


pixstats nsieve:
 159336572 (1.000) instructions
   6392573 (0.040) loads
  20105454 (0.126) stores
  26498027 (0.166) loads+stores		(2)
  26498114 (0.166) data bus use
  26484254 (0.166) partial word references (2)
  33974425 (0.213) branches

0.999 load nops per load (4)
0.759 stores per memory reference	(3)
0.999 partial word references per reference (2)

Instruction concentration:
         1   8.6%
         2  17.2%
         4  34.4%
         8  55.1%
        16  87.1%
        32 100.0%   (1)

Load/store displacement   16.6% (5)  (the size says how many bits used)
size                0-extend                1-extend sign-extend
   0  26482960 100.0% 100.0%        29   0.0%   0.0%
   1        36   0.0% 100.0%         0   0.0%   0.0%  100.0%
   2         0   0.0% 100.0%        29   0.0%   0.0%  100.0%
   3      1912   0.0% 100.0%         8   0.0%   0.0%  100.0%
   4      1654   0.0% 100.0%         0   0.0%   0.0%  100.0%
   9         2   0.0% 100.0%         0   0.0%   0.0%  100.0%
  10        18   0.0% 100.0%         0   0.0%   0.0%  100.0%
  11        92   0.0% 100.0%         0   0.0%   0.0%  100.0%
  12       602   0.0% 100.0%         0   0.0%   0.0%  100.0%
  13        60   0.0% 100.0%         0   0.0%   0.0%  100.0%

C.  Now, what the program was doing:

Profile listing generated Wed Aug 24 18:11:22 1988 with:
----------------------------------------------------------------------------
*  -p[rocedures] using basic-block counts;                                 *
*  sorted in descending order by the number of cycles executed in each     *
*  procedure; unexecuted procedures are excluded                           *
----------------------------------------------------------------------------
159336572 cycles
    cycles %cycles  cum %     cycles  bytes procedure (file)
                               /call  /line
 159286753   99.97  99.97   22755251     14 SIEVE (nsieve.c)
     24351    0.02  99.98         43     29 _flsbuf (flsbuf.c)
     16695    0.01  99.99       1392     18 _doprnt (doprnt.c)
	...rest are in the noise, obviously.....

----------------------------------------------------------------------------
*  -h[eavy] using basic-block counts;                                      *
*  sorted in descending order by the number of cycles executed in each     *
*  line; unexecuted lines are excluded                                     *
----------------------------------------------------------------------------
(source code with line numbering is shown following)

procedure (file)                           line bytes     cycles      %  cum %

SIEVE (nsieve.c)                            241    12   41148840  25.83  25.83
SIEVE (nsieve.c)                            240     8   27432560  17.22  43.04
SIEVE (nsieve.c)                            231    20   25527710  16.02  59.06
SIEVE (nsieve.c)                            235    16   25527640  16.02  75.08
SIEVE (nsieve.c)                            244    16   25527640  16.02  91.11
SIEVE (nsieve.c)                            230     4    6381910   4.01  95.11
SIEVE (nsieve.c)                            238    16    4422440   2.78  97.89
SIEVE (nsieve.c)                            237     8    2211220   1.39  99.27
SIEVE (nsieve.c)                            242     4    1105610   0.69  99.97
228		for(i=0 ; i<=size ; i++)
229		{
230		   *(flags+i) = true;
231	        }
233			for(i=0 ; i<=size ; i++)
234			{
235				if(*(flags+i))
236				{
237				   prime = i + i + 3;
238					for(k = i + prime ; k<=size ; k+=prime)
239					{
240					*(flags+k)=false;
241					}
242	                           count++;
243	                        }
244	                }


D. Summary
	The statistics of this benchmark bear little resemblance to
	most substantial programs.  This doesn't mean it's a bad benchmark,
	just that extreme care must be taken with it or any other small
	benchmark.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086