Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!wuarchive!uwm.edu!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!ejk From: ejk@ux1.cso.uiuc.edu (Ed Kubaitis) Newsgroups: comp.arch Subject: yet-another-benchmark Message-ID: <1989Nov12.160221.26921@ux1.cso.uiuc.edu> Date: 12 Nov 89 16:02:21 GMT Sender: news@ux1.cso.uiuc.edu (News) Reply-To: ejk@ux1.cso.uiuc.edu (Ed Kubaitis) Followup-To: yet-another-benchmark Organization: University of Illinois at Urbana Lines: 204 Here is an updated list of results. Thanks for the responses. ------------------------------------------------------------------------------- Attached is yet-another-benchmark that might cast some light on aspects of architecture. As with all benchmarks, there is a very serious question of relevance to one's own applications. However, unlike many others, it is small enough to see in detail what is being measured. The numbers reported are trips/processor_second through the loop below. The calculation does not seem to lend itself to vector/parallel enhancements. int W, H, np, mxp, nP, mxP; double A, B, C; char *bmap; hopalong() { int wc=W/8, cx=W/2, cy=H/2, ix, iy; double x=0, y=0, xx, yy, t; while (np < mxp && ++nP < mxP) { t = sqrt(fabs(B*x-C)); xx = y - ( (x<0) ? t : -t ); yy = A - x; x = xx; y = yy; ix = cx + x; iy = cy + y; if (ix>-1 && iy>-1 && ix<(W-1) && iy<(H-1)) { bmap[iy*wc+(ix>>3)] |= 1<<(ix&7); np++; } } } It's building a bitmap of a fractal to display in an X root window. (Barry Martin algorithm published in A.K. Dewdney's "Computer Recreations" in the September 86 Scientific American.) ------------------------------------------------------------------------------- Newsgroups: comp.windows.x From: ejk@ux1.cso.uiuc.edu (Ed Kubaitis) Subject: xfroot timing update Date: Sun, 12 Nov 89 15:36:56 GMT Here is the 6th updated list of xfroot fractal-points/processor_second measured on various clients. The number, a count of trips/second through the 9 line "hopalong" loop in xfroot, is a rough index of scalar double-precision floating point uniprocessor speed. The lower number represents a case where nearly all points are in-range and thus require additional integer arithmetic, bit manipulation, and memory accesses to record the point. The higher number reflects a case when most points are out of range and most time is spent in floating point arithmetic. Key: () : Vax 780 equivalents * : For a single processor + : Using hardware square root > : New since last posting 304000 (56.2) 619000(100.3)* Cray 2 (scc) 316000 (58.4) 476000 (77.1)* Cray Y-MP (scc) 283000 (52.3) 415000 (67.3)* Cray X-MP (scc) 185000 (34.2) 263000 (42.6)*+ > Apollo DN10000 (See note below) 143000 (26.4) 195000 (31.6)*+ ETA-10 G 157000 (29.0) 194000 (31.4)* Cray X-MP (cc) 129000 (23.8) 183000 (29.7)* Cray 2 (cc) 174000 (32.2) 182000 (29.5)* > Amdahl 5990 115000 (21.3) 170000 (27.6)* > Apollo DN10000 (-D_BUILTINS) 117000 (21.6) 151000 (24.5)*+ Convex C2 (gcc) 108000 (20.0) 144000 (23.3) SGI Iris 4D/240 (-lfastm) 108000 (20.0) 138000 (22.4)*+ Convex C2 (vc3/fastmath) 99000 (18.3) 118000 (19.1)*+ Convex C2 (vc3) 95000 (17.6) 115000 (18.6) DEC DS5800 89000 (16.5) 111000 (18.0) SGI Iris 4D/240 73000 (13.5) 94000 (15.2)+ Sun 4/370 (f77/libm.i1) 66000 (12.2) 92000 (14.9) HP9000/835CHX 78000 (14.4) 92000 (14.9) > Sony NWS-3860 77000 (14.2) 91000 (14.7) DEC DS5400 58000 (10.7) 75000 (12.2) DEC DS3100 61000 (11.3) 70000 (11.3) Tektronix XD88/30 52000 (9.6) 69000 (11.2)+ Sun 4/280 58000 (10.7) 67000 (10.9) Solbourne Series5 Cypress 49000 (9.1) 60000 (9.7)* Gould NP1 50000 (9.2) 57000 (9.2) DEC Vax 6400 (vcc) 49000 (9.1) 55000 (8.9)* Convex C2 (vc2) 45000 (8.3) 54000 (8.8) SGI Iris 4D/70-GT 43000 (7.9) 53000 (8.6) > Sun SPARCstation 1 (see note below) 42000 (7.8) 48000 (7.8) Sun 4/370 (libm.i1) 41000 (7.6) 47000 (7.6)* Convex C2 (cc) 41000 (7.6) 47000 (7.6) Sun 4/370 28000 (5.2) 33000 (5.3) Dec Vax 8650 28000 (5.2) 33000 (5.3) > Stellar GS 2000 (-O2) 26500 (4.9) 30300 (4.9) > Mac II (w/ Siclone 3033) 20800 (3.8) 28900 (4.7) Sun SPARCstation 1 (see note below) 24000 (4.4) 28000 (4.5) HP9000/370 (ffpa) 24700 (4.6) 27800 (4.5) Sun SPARCstation 1 (gcc) 22800 (4.2) 27100 (4.4) Titan 22900 (4.2) 26100 (4.2) DEC MV3900 (vcc) 19900 (3.7) 25200 (4.1) Sun SPARCstation 1 17200 (3.2) 24200 (3.9) DG AViiON (88k 16.7 MHz) 22300 (4.1) 23700 (3.8) 386/33 + 387 (cc 386/ix) 21100 (3.9) 23600 (3.8) Sun 4/260 20100 (3.7) 23400 (3.8)* Sequent Symmetry (fpa) 19700 (3.6) 23200 (3.8) Dec Vax 8530 21000 (3.9) 23000 (3.7) Sun 4/280 19700 (3.6) 22400 (3.6) Dec Vax 8600 19500 (3.6) 21600 (3.5) > Apollo DN4500 (-D_BUILTINS) 16800 (3.1) 19200 (3.1) DEC Vax 6220 16800 (3.1) 17600 (2.9) 386/33 + 387 (gcc 1.35) 15400 (2.8) 17500 (2.8) DEC MV3200 (vcc) 15200 (2.8) 17400 (2.8) IBM RT 135 (-f2 -lfm) 14500 (2.7) 17400 (2.8) DEC MV3600 (vcc) 15900 (2.9) 17300 (2.8) HP9000/370 13800 (2.6) 16100 (2.6) > Apollo DN3550 (-D_BUILTINS) 13900 (2.6) 16000 (2.6) IBM RT125 (afpa) 13800 (2.6) 15900 (2.6) > Apollo DN3500 (-D_BUILTINS) 13700 (2.5) 15200 (2.5) HP9000/360 13200 (2.4) 15200 (2.5) DEC Vaxserver 3500 13000 (2.4) 15100 (2.4) Dec Vaxstation 3100 14000 (2.6) 14800 (2.4) Sun 386i/250 Weitek (cc) 12900 (2.4) 14000 (2.3) Sun 3/60 (-O4 lib/f68881) 11900 (2.2) 13200 (2.1) > Apollo DN4000 (-D_BUILTINS) 11000 (2.0) 12900 (2.1) > Apollo DN2500 (-D_BUILTINS) 10500 (1.9) 12700 (2.1) Sun 3/50 (gcc 68881) 9700 (1.8) 12100 (2.0) > Mac II 10600 (2.0) 11500 (1.9) IBM RT 135 10500 (1.9) 11500 (1.9) HP9000/350 9900 (1.8) 10500 (1.7)* Sequent Symmetry 9200 (1.7) 9700 (1.6) IBM RT 115 (4.3BSD High C 2.1) 8000 (1.5) 8750 (1.4) Sun 3/60 (-f 68881) 7930 (1.5) 8670 (1.4) > HP9000/340 7000 (1.3) 8200 (1.3) 386/25 + 387 (cc 386/ix) 7300 (1.3) 8000 (1.3) IBM RT 115 (4.3BSD High C 1.4) 7280 (1.3) 7910 (1.3) HP9000/330 (HP-UX 6.5 cc) 7200 (1.3) 7600 (1.2) IBM RT 125 5530 (1.0) 6330 (1.0) DEC Vaxstation 2000/vcc 5730 (1.1) 6230 (1.0) HP9000/330 6000 (1.1) 6200 (1.0) 386/25 + 387 (gcc) 5410 (1.0) 6170 (1.0) DEC Vax 780 5580 (1.0) 6150 (1.0) HP9000/320 5560 (1.0) 6120 (1.0) > Apollo DN3000 (-D_BUILTINS) 5480 (1.0) 6080 (1.0) Sun 3/50 (-f 68881) 4670 (0.9) 5530 (0.9) DEC Vaxstation 2000 4160 (0.8) 5210 (0.8) DEC MVII (cc) 4080 (0.8) 5070 (0.8) DEC MVII (vcc) 1960 (0.4) 2060 (0.3) Sun 3/60 1270 (0.2) 1330 (0.2) Sun 3/50 ??? (???) 950 (0.2) Sun 3/160 (no fpa) 530 (0.1) 560 (0.1) Sun 2/120 (no fpu - cc) 340 (0.1) 360 (0.1) DEC Vax 730 259 (0.0) 260 (0.0) 386/25 (386/ix - no 387) A few notes on the results: o The top DN10000 timings used the PRISM 6.7(359) compiler with the following options: -opt 4 -cpu a88k -def sqrt=_builtin_sqrt -def fabs=_builtin_fabs. o Two SPARCstation results using sqrt.i1 and libm.i1 were reported. The only difference appeared to be that the faster one was compiled and linked in one step. Can anyone enlighten us on this? o The Cray scc compiler uses the same backend as their Fortran. o gcc enhancements are due to inline code for sqrt & fabs. o Strikingly different results for the same system show that it pays to shop around for the best compiler/options/libraries available. Thanks to: archer@sgi.com, bauer@loligo.cc.fsu.edu, bav@hobbes.ksu.ksu.edu, bryan%kewill@uunet.uu.net, bt@irfu.se, casey@gauss.llnl.gov, csmith@convex.com, csu@alembic.acs.com, eric@geology.tn.cornell, dave@rutgers.edu, david@torsqnt.uucp, evans@decvax.dec.com, garyc@quasi.wv.tek.com, glenn@mathcs.emory.edu, harrison@decwrl.dec.com, hleroy@erisa.fr, howard@aic.hrl.hac.com, hrp@boring.cray.com, idallen@watgcl.waterloo.edu, jpb@sn2024.cray.com, jw@pan.uucp, ken@cs.toronto.edu, kline@ux1.cso.uiuc.edu, ksp@maxwell.nde.swri.edu, kucharsk@uts.amdahl.com, lnz@lucid.com, mark@zok.uucp, markw@airgun.wg.waii.com, michael@ws.sony.co.jp, moraes@csri.toronto.edu, paul@db0tui66.bitnet, rauletta@gmuvax2.gmu.edu, skam@solbourne.com, sommerfeld@apollo.com, steved@longs.lance.colostate.edu, tac@csl.ncsu.edu, thp@westhawk.uucp, tony@popserver.stanford.edu, tpf@jdyx.uucp, wesommer@athena.mit.edu, zimet@sequoia.berkeley.edu, for sharing their results. (Please assume the standard disclaimers for all.) I would appreciate hearing about measurements on other clients, or results differing significantly from those above. To perform your own: 1. Get xfroot/part01 (V5-I3) and xfroot/patch1(V5-I7) from comp.sources.x. These are available via anonymous ftp from uunet.uu.net. While they will eventually be found there in comp.sources.x/volume5, as of this writing they are in comp.sources.x/new/890924.0.Z and 890929.0. If you don't have ftp access to uunet.uu.net, I will be happy to mail a copy (~700 lines.) 2. Install xfroot on the client to be tested, taking care that you have verified the definition of HZ in xfroot.c. (See the README.) 3. Make the following two runs: xfroot -a 0.1 -b 0.1 -c 0.1 (lower bound) xfroot -a 3000 -b 3000 -c 3000 (upper bound) Please mention any details (compilers/libraries/options) you think are relevant. ------------------------- Ed Kubaitis (ejk@ux1.cso.uiuc.edu) Computing Services Office - University of Illinois, Urbana