Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!rutgers!cunixf.cc.columbia.edu!shenkin From: shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin) Newsgroups: comp.sys.sgi Subject: Re: SGI GL matrix performance -- more benchmarks, this time on a PI Message-ID: <1991Apr27.163323.22778@cunixf.cc.columbia.edu> Date: 27 Apr 91 16:33:23 GMT Organization: Columbia University Lines: 55 In article <15407@helios.TAMU.EDU> jamie@archone.tamu.edu (James Price) writes: >Has anyone done any benchmarking of the SGI matrix functions? I was curious >and wrote the program included below.... Jamie: You ought to tell us what kind of Iris "fritz" is, and also what version of IRIX you're running. But in any case, I ran your benchmark on avogadro, a 4d25tg running 4.2.1, with the following results. (Yours included for comparison.) I see that avogadro is faster than fritz in every regard. I note that compiling matperf.c with increasing levels of optimization (-O2 and -O3) SLOWS DOWN the hardware performance -- and even, in some cases, the software performance (!) -- CONSIDERABLY. Can anyone explain this? I only did one run each, but these differences are BIG, and I've noted them in the table with exclamation points. This is highly distressing, since one wants to compile with high optimization to get the max out of one's own code, and I'd hate to think that doing so necessarily slows down graphics performance. I note that with -O2 and -O3, software performance is far better than hardware performance is in its best case, at least if one needs to get the results back. :-) Thus I conclude that at least for my machine, it doesn't make sense to do matrix multiplication using the graphics pipeline, except in the context of graphics. Another conclusion, at least on my machine: stay away from -O3 ! Caveat: My machine does not have , so I removed that #define; I do get compilation warnings about parameter mismatches, but the thing compiles. Might this be affecting performance? I've included the results Jamie reported for comparison. All are for a command-line argument of 10000 to Jamie's matperf program. Machine: fritz ----------- avogadro ------------- GL version: GL4DGT-3.3 ---------- GL4DPIT-3.2 ----------- Matperf Optimization level: -O1 ?? -O1 -O2 -O3 Software - no optimization: 3.349 sec. 1.860 sec. 0.578 sec. 0.578 sec. Software - some optimization: 1.130 sec. 0.420 sec. 0.378 sec. 0.359 sec. Software - more optimization: 0.910 sec. 0.330 sec. 0.359 sec. !0.677 sec. Hardware - preserve CTM: 2.379 sec. 0.890 sec. 0.976 sec. 0.876 sec. Hardware - destroy CTM: 2.289 sec. 0.820 sec. 1.086 sec. 0.837 sec. Hardware - abandon results: 0.580 sec. 0.430 sec. 0.539 sec. !0.797 sec. -P. ************************f*u*cn*rd*ths*u*cn*gt*a*gd*jb************************** Peter S. Shenkin, Department of Chemistry, Barnard College, New York, NY 10027 (212)854-1418 shenkin@cunixf.cc.columbia.edu(Internet) shenkin@cunixf(Bitnet) ***"In scenic New York... where the third world is only a subway ride away."***