Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!wuarchive!uwm.edu!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!ejk
From: ejk@ux1.cso.uiuc.edu (Ed Kubaitis)
Newsgroups: comp.arch
Subject: yet-another-benchmark
Message-ID: <1989Nov12.160221.26921@ux1.cso.uiuc.edu>
Date: 12 Nov 89 16:02:21 GMT
Sender: news@ux1.cso.uiuc.edu (News)
Reply-To: ejk@ux1.cso.uiuc.edu (Ed Kubaitis)
Followup-To: yet-another-benchmark
Organization: University of Illinois at Urbana
Lines: 204

Here is an updated list of results. Thanks for the responses.
-------------------------------------------------------------------------------
Attached is yet-another-benchmark that might cast some light on aspects
of architecture.  As with all benchmarks, there is a very serious question 
of relevance to one's own applications. However, unlike many others, it 
is small enough to see in detail what is being measured.

The numbers reported are trips/processor_second through the loop below. The
calculation does not seem to lend itself to vector/parallel enhancements.

   int W, H, np, mxp, nP, mxP;
   double A, B, C;
   char *bmap;

   hopalong() {
      int wc=W/8, cx=W/2, cy=H/2, ix, iy; 
      double x=0, y=0, xx, yy, t;

      while (np < mxp && ++nP < mxP) {
	 t = sqrt(fabs(B*x-C));
	 xx = y - ( (x<0) ? t : -t );
	 yy = A - x;
	 x = xx; y = yy;
	 ix = cx + x; iy = cy + y;
	 if (ix>-1 && iy>-1 && ix<(W-1) && iy<(H-1)) {
	    bmap[iy*wc+(ix>>3)] |= 1<<(ix&7);
	    np++;
	    }
	 }
      }

It's building a bitmap of a fractal to display in an X root window. 
(Barry Martin algorithm published in A.K. Dewdney's "Computer Recreations" 
in the September 86 Scientific American.)
-------------------------------------------------------------------------------
Newsgroups: comp.windows.x
From: ejk@ux1.cso.uiuc.edu (Ed Kubaitis)
Subject: xfroot timing update
Date: Sun, 12 Nov 89 15:36:56 GMT

Here is the 6th updated list of xfroot fractal-points/processor_second 
measured on various clients. The number, a count of trips/second
through the 9 line "hopalong" loop in xfroot, is a rough index of scalar 
double-precision floating point uniprocessor speed. The lower number 
represents a case where nearly all points are in-range and thus require
additional integer arithmetic, bit manipulation, and memory accesses to
record the point. The higher number reflects a case when most points are
out of range and most time is spent in floating point arithmetic.  
   
      Key:   () : Vax 780 equivalents
             *  : For a single processor
             +  : Using hardware square root
	     >  : New since last posting

      304000 (56.2)   619000(100.3)*     Cray 2    (scc)         
      316000 (58.4)   476000 (77.1)*     Cray Y-MP (scc)         
      283000 (52.3)   415000 (67.3)*     Cray X-MP (scc)         
      185000 (34.2)   263000 (42.6)*+  > Apollo DN10000 (See note below)
      143000 (26.4)   195000 (31.6)*+    ETA-10 G                
      157000 (29.0)   194000 (31.4)*     Cray X-MP (cc)          
      129000 (23.8)   183000 (29.7)*     Cray 2    (cc)          
      174000 (32.2)   182000 (29.5)*   > Amdahl 5990             
      115000 (21.3)   170000 (27.6)*   > Apollo DN10000 (-D_BUILTINS)
      117000 (21.6)   151000 (24.5)*+    Convex C2 (gcc)         
      108000 (20.0)   144000 (23.3)      SGI Iris 4D/240 (-lfastm)
      108000 (20.0)   138000 (22.4)*+    Convex C2 (vc3/fastmath)
       99000 (18.3)   118000 (19.1)*+    Convex C2 (vc3)         
       95000 (17.6)   115000 (18.6)      DEC DS5800              
       89000 (16.5)   111000 (18.0)      SGI Iris 4D/240         
       73000 (13.5)    94000 (15.2)+     Sun 4/370 (f77/libm.i1) 
       66000 (12.2)    92000 (14.9)      HP9000/835CHX           
       78000 (14.4)    92000 (14.9)    > Sony NWS-3860           
       77000 (14.2)    91000 (14.7)      DEC DS5400              
       58000 (10.7)    75000 (12.2)      DEC DS3100              
       61000 (11.3)    70000 (11.3)      Tektronix XD88/30       
       52000  (9.6)    69000 (11.2)+     Sun 4/280               
       58000 (10.7)    67000 (10.9)      Solbourne Series5 Cypress  
       49000  (9.1)    60000  (9.7)*     Gould NP1               
       50000  (9.2)    57000  (9.2)      DEC Vax 6400 (vcc)      
       49000  (9.1)    55000  (8.9)*     Convex C2 (vc2)         
       45000  (8.3)    54000  (8.8)      SGI Iris 4D/70-GT       
       43000  (7.9)    53000  (8.6)    > Sun SPARCstation 1 (see note below)
       42000  (7.8)    48000  (7.8)      Sun 4/370 (libm.i1)     
       41000  (7.6)    47000  (7.6)*     Convex C2 (cc)          
       41000  (7.6)    47000  (7.6)      Sun 4/370               
       28000  (5.2)    33000  (5.3)      Dec Vax 8650            
       28000  (5.2)    33000  (5.3)    > Stellar GS 2000 (-O2)   
       26500  (4.9)    30300  (4.9)    > Mac II (w/ Siclone 3033)
       20800  (3.8)    28900  (4.7)      Sun SPARCstation 1 (see note below)
       24000  (4.4)    28000  (4.5)      HP9000/370 (ffpa)       
       24700  (4.6)    27800  (4.5)      Sun SPARCstation 1 (gcc)
       22800  (4.2)    27100  (4.4)      Titan                   
       22900  (4.2)    26100  (4.2)      DEC MV3900 (vcc)        
       19900  (3.7)    25200  (4.1)      Sun SPARCstation 1          
       17200  (3.2)    24200  (3.9)      DG AViiON (88k 16.7 MHz)
       22300  (4.1)    23700  (3.8)      386/33 + 387   (cc 386/ix)
       21100  (3.9)    23600  (3.8)      Sun 4/260               
       20100  (3.7)    23400  (3.8)*     Sequent Symmetry (fpa)  
       19700  (3.6)    23200  (3.8)      Dec Vax 8530            
       21000  (3.9)    23000  (3.7)      Sun 4/280               
       19700  (3.6)    22400  (3.6)      Dec Vax 8600            
       19500  (3.6)    21600  (3.5)    > Apollo DN4500 (-D_BUILTINS)
       16800  (3.1)    19200  (3.1)      DEC Vax 6220            
       16800  (3.1)    17600  (2.9)      386/33 + 387 (gcc 1.35) 
       15400  (2.8)    17500  (2.8)      DEC MV3200 (vcc)        
       15200  (2.8)    17400  (2.8)      IBM RT 135 (-f2 -lfm)   
       14500  (2.7)    17400  (2.8)      DEC MV3600 (vcc)        
       15900  (2.9)    17300  (2.8)      HP9000/370              
       13800  (2.6)    16100  (2.6)    > Apollo DN3550 (-D_BUILTINS)
       13900  (2.6)    16000  (2.6)      IBM RT125 (afpa)        
       13800  (2.6)    15900  (2.6)    > Apollo DN3500 (-D_BUILTINS)
       13700  (2.5)    15200  (2.5)      HP9000/360              
       13200  (2.4)    15200  (2.5)      DEC Vaxserver 3500      
       13000  (2.4)    15100  (2.4)      Dec Vaxstation 3100     
       14000  (2.6)    14800  (2.4)      Sun 386i/250 Weitek (cc)
       12900  (2.4)    14000  (2.3)      Sun 3/60 (-O4 lib/f68881)
       11900  (2.2)    13200  (2.1)    > Apollo DN4000 (-D_BUILTINS)
       11000  (2.0)    12900  (2.1)    > Apollo DN2500 (-D_BUILTINS)
       10500  (1.9)    12700  (2.1)      Sun 3/50 (gcc 68881)     
        9700  (1.8)    12100  (2.0)    > Mac II                  
       10600  (2.0)    11500  (1.9)      IBM RT 135               
       10500  (1.9)    11500  (1.9)      HP9000/350               
        9900  (1.8)    10500  (1.7)*     Sequent Symmetry         
        9200  (1.7)     9700  (1.6)      IBM RT 115 (4.3BSD High C 2.1)
        8000  (1.5)     8750  (1.4)      Sun 3/60 (-f 68881)      
        7930  (1.5)     8670  (1.4)    > HP9000/340                
        7000  (1.3)     8200  (1.3)      386/25 + 387   (cc 386/ix)
        7300  (1.3)     8000  (1.3)      IBM RT 115 (4.3BSD High C 1.4)
        7280  (1.3)     7910  (1.3)      HP9000/330 (HP-UX 6.5 cc) 
        7200  (1.3)     7600  (1.2)      IBM RT 125                
        5530  (1.0)     6330  (1.0)      DEC Vaxstation 2000/vcc   
        5730  (1.1)     6230  (1.0)      HP9000/330                
        6000  (1.1)     6200  (1.0)      386/25 + 387   (gcc)      
        5410  (1.0)     6170  (1.0)      DEC Vax 780               
        5580  (1.0)     6150  (1.0)      HP9000/320                
        5560  (1.0)     6120  (1.0)    > Apollo DN3000 (-D_BUILTINS)
        5480  (1.0)     6080  (1.0)      Sun 3/50 (-f 68881)       
        4670  (0.9)     5530  (0.9)      DEC Vaxstation 2000       
        4160  (0.8)     5210  (0.8)      DEC MVII   (cc)           
        4080  (0.8)     5070  (0.8)      DEC MVII   (vcc)          
        1960  (0.4)     2060  (0.3)      Sun 3/60                  
        1270  (0.2)     1330  (0.2)      Sun 3/50                  
         ???  (???)      950  (0.2)      Sun 3/160 (no fpa)        
         530  (0.1)      560  (0.1)      Sun 2/120 (no fpu - cc) 
         340  (0.1)      360  (0.1)      DEC Vax 730               
         259  (0.0)      260  (0.0)      386/25 (386/ix - no 387)  

A few notes on the results: 
     
     o The top DN10000 timings used the PRISM 6.7(359) compiler with
       the following options: -opt 4 -cpu a88k -def sqrt=_builtin_sqrt
       -def fabs=_builtin_fabs.

     o Two SPARCstation results using sqrt.i1 and libm.i1 were reported.
       The only difference appeared to be that the faster one was compiled
       and linked in one step. Can anyone enlighten us on this?

     o The Cray scc compiler uses the same backend as their Fortran.

     o gcc enhancements are due to inline code for sqrt & fabs.

     o Strikingly different results for the same system show that it pays to 
       shop around for the best compiler/options/libraries available.

Thanks to:  archer@sgi.com, bauer@loligo.cc.fsu.edu, bav@hobbes.ksu.ksu.edu, 
bryan%kewill@uunet.uu.net, bt@irfu.se, casey@gauss.llnl.gov, csmith@convex.com,
csu@alembic.acs.com, eric@geology.tn.cornell, dave@rutgers.edu, 
david@torsqnt.uucp, evans@decvax.dec.com, garyc@quasi.wv.tek.com, 
glenn@mathcs.emory.edu, harrison@decwrl.dec.com, hleroy@erisa.fr, 
howard@aic.hrl.hac.com, hrp@boring.cray.com, idallen@watgcl.waterloo.edu, 
jpb@sn2024.cray.com, jw@pan.uucp, ken@cs.toronto.edu, kline@ux1.cso.uiuc.edu, 
ksp@maxwell.nde.swri.edu, kucharsk@uts.amdahl.com, lnz@lucid.com,
mark@zok.uucp,
markw@airgun.wg.waii.com, michael@ws.sony.co.jp, moraes@csri.toronto.edu, 
paul@db0tui66.bitnet, rauletta@gmuvax2.gmu.edu, skam@solbourne.com, 
sommerfeld@apollo.com, steved@longs.lance.colostate.edu, tac@csl.ncsu.edu, 
thp@westhawk.uucp, tony@popserver.stanford.edu, tpf@jdyx.uucp,
wesommer@athena.mit.edu, zimet@sequoia.berkeley.edu, for sharing their results.
(Please assume the standard disclaimers for all.)

I would appreciate hearing about measurements on other clients, or results
differing significantly from those above.  To perform your own:

	1. Get xfroot/part01 (V5-I3) and xfroot/patch1(V5-I7) from
	   comp.sources.x. These are available via anonymous ftp from
	   uunet.uu.net. While they will eventually be found there in
	   comp.sources.x/volume5, as of this writing they are in
	   comp.sources.x/new/890924.0.Z and 890929.0. If you don't
	   have ftp access to uunet.uu.net, I will be happy to mail
	   a copy (~700 lines.)
	2. Install xfroot on the client to be tested, taking care
	   that you have verified the definition of HZ in xfroot.c.
	   (See the README.)
	3. Make the following two runs:

	      xfroot -a 0.1 -b 0.1 -c 0.1    (lower bound)
	      xfroot -a 3000 -b 3000 -c 3000 (upper bound)

Please mention any details (compilers/libraries/options) you think are
relevant.

-------------------------
Ed Kubaitis (ejk@ux1.cso.uiuc.edu)
Computing Services Office - University of Illinois, Urbana