Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!rutgers!cunixf.cc.columbia.edu!shenkin
From: shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin)
Newsgroups: comp.sys.sgi
Subject: Re: SGI GL matrix performance -- more benchmarks, this time on a PI
Message-ID: <1991Apr27.163323.22778@cunixf.cc.columbia.edu>
Date: 27 Apr 91 16:33:23 GMT
Organization: Columbia University
Lines: 55

In article <15407@helios.TAMU.EDU> jamie@archone.tamu.edu (James Price) writes:
>Has anyone done any benchmarking of the SGI matrix functions?  I was curious
>and wrote the program included below....

Jamie:  You ought to tell us what kind of Iris "fritz" is, and also what 
version of IRIX you're running.  But in any case, I ran your benchmark on 
avogadro, a 4d25tg running 4.2.1, with the following results.  (Yours included 
for comparison.)  I see that avogadro is faster than fritz in every regard.

I note that compiling matperf.c with increasing levels of optimization
(-O2 and -O3) SLOWS DOWN the hardware performance -- and even, in some cases, 
the software performance (!) -- CONSIDERABLY.  Can anyone explain this?  I 
only did one run each, but these differences are BIG, and I've noted them 
in the table with exclamation points.  This is highly distressing, since one
wants to compile with high optimization to get the max out of one's own
code, and I'd hate to think that doing so necessarily slows down graphics
performance.

I note that with -O2 and -O3, software performance is far better than
hardware performance is in its best case, at least if one needs to get the
results back.  :-)  Thus I conclude that at least for my machine, it doesn't 
make sense to do matrix multiplication using the graphics pipeline, except 
in the context of graphics.  Another conclusion, at least on my machine:
stay away from -O3 !

Caveat:  My machine does not have <stdlib.h>, so I removed that #define;
I do get compilation warnings about parameter mismatches, but the thing
compiles.  Might this be affecting performance?

I've included the results Jamie reported for comparison.  All are for a
command-line argument of 10000 to Jamie's matperf program.
  
  
Machine:                        fritz       ----------- avogadro -------------
GL version:                     GL4DGT-3.3  ---------- GL4DPIT-3.2 -----------
Matperf Optimization level:     -O1 ??      -O1         -O2         -O3

Software - no optimization:     3.349 sec.  1.860 sec.  0.578 sec.  0.578 sec. 
  
Software - some optimization:   1.130 sec.  0.420 sec.  0.378 sec.  0.359 sec. 
  
Software - more optimization:   0.910 sec.  0.330 sec.  0.359 sec. !0.677 sec.
  
Hardware - preserve CTM:        2.379 sec.  0.890 sec.  0.976 sec.  0.876 sec. 
  
Hardware - destroy CTM:         2.289 sec.  0.820 sec.  1.086 sec.  0.837 sec. 
  
Hardware - abandon results:     0.580 sec.  0.430 sec.  0.539 sec. !0.797 sec. 


	-P.
************************f*u*cn*rd*ths*u*cn*gt*a*gd*jb**************************
Peter S. Shenkin, Department of Chemistry, Barnard College, New York, NY  10027
(212)854-1418  shenkin@cunixf.cc.columbia.edu(Internet)  shenkin@cunixf(Bitnet)
***"In scenic New York... where the third world is only a subway ride away."***