Newsgroups: comp.benchmarks Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!hellgate.utah.edu!cs.utah.edu!thomson From: thomson@cs.utah.edu (Rich Thomson) Subject: VGX benchmark redux Date: 9 Apr 91 15:46:16 MDT Message-ID: <1991Apr9.154616.1976@hellgate.utah.edu> Followup-To: comp.benchmarks Keywords: VGX, GPC, PLB Organization: Computer Science Department, University of Utah, SLC, UT References: <1991Mar28.213128.9355@hellgate.utah.edu> <1991Apr1.154902.17858@odin.corp.sgi.com> In article <1991Mar28.213128.9355@hellgate.utah.edu>, I posted a VGX benchmark program mailed to me by Brian McClendon of SGI. I have recently found the time to take a close look at this program and the one posted by Kurt Akeley. Although I have yet to try this particular program out on a VGX machine, I will postpone that effort folr this particular program. If we examine the code of the program, we find that the polygons it is attempting to display are created with the following loop: #define SQRT3_2 (1.7321/2.0) /* initialize data arrays */ for (i=0; i<(1 + NUMTRI/2); i++) { tribuf[i*8+0] = size*i; tribuf[i*8+1] = 0; tribuf[i*8+2] = 0; tribuf[i*8+4] = size*i + size/2; tribuf[i*8+5] = size*SQRT3_2; tribuf[i*8+6] = 0; } [...] bgntmesh(); for(i=0;i<(1 + NUMTRI/2);i++) { n3f(&normbuf[(i%2)*4]); v3f(&tribuf[i*8]); n3f(&normbuf[(i%4)*4]); v3f(&tribuf[i*8 + 4]); } endtmesh(); closeobj(); Notice that this creates a big, linear triangle strip that stretches off the right side of the screen (especially if the triangles are the 50-pixel triangles quoted in the marketing literature). This results in most of the triangles being clipped from the view volume. The program that Kurt Akeley posted in article <1991Apr1.154902.17858@odin.corp.sgi.com> was much more reasonable, it created a certain number of triangles per strip, with each strip being linear, but with all the strips beginning at the same position relative to the display window: /* initialize data arrays */ for (i=0; i>1)) + (float)(offset*(i&1)); meshbuf[VERTSIZE*i+5] = 10.0 + (float)(size*(i&1)); meshbuf[VERTSIZE*i+6] = 0.0; meshbuf[VERTSIZE*i+7] = 0; } [...] #define LIGHTVERT(i) n3f(fp+(VERTSIZE*(i))); v3f(fp+(VERTSIZE*(i))+4) for (i=events; i>0; i--) { fp = meshbuf; bgntmesh(); LIGHTVERT(0); LIGHTVERT(1); LIGHTVERT(2); endtmesh(); } Now on to some comments on Kurt's article: > We take our graphics performance claims very seriously here at Silicon > Graphics. I'm sure you take them as seriously as MIPS, HP, and IBM take their spec mark ratings. Sadly the graphics community does not yet have the equivalent of the specmark rating on which to intelligently compare different platforms. Just look at the claims made when comparing X implementations. The customer gets left in the lurch unless they undertake analyzing the voluminous output of x11perf to find out the real story. I began to be skeptical when I saw the figure posted several times on comp.graphics and queries to the poster responded with "its from out marketing literature, I'll ask a ``tech type'' to send you a program" (I never heard back from him). Also, at a recent VGX demonstration at the U, the rep couldn't tell me details about the figure, nor could he show me a program with a high polygon rate. He also didn't have any models with several hundred thousand (say, 40% of the peak figure, or 300K - 400K polygons) polygons, although he's a sharp enough man that I imagine he WILL have them next time in case I'm there. ;-} Hopefully, when the Graphics Performance Committee releases its Picture Level Benchmark program (& numbers come forth from vendors) this situation will be alleviated. For now, we are stuck with comparing performance numbers from each different vendor and attempting to infer useful comparisons from widely differing measures. For instance, you say: > [quoted performance comes from] tuned programs that use ONLY > commands that are available in the Graphics Library. So these numbers are highly tuned for the architecture of the VGX and are reproducible only with a vendor-specific library. This is very understandable, giving the position SGI holds in the 3D market, but it is very difficult to compare different platforms with these kinds of numbers in your hand. [Perhaps that is the intention of the marketing dept? ;-] > I ran this program on my 5-span VGX with the following results: > size=8, offset=4, zbuffer(1), events=500000, lighting=1 > running on cashew, GL4DVGX-4.0, Fri Mar 29 15:22:58 1991 > Triangle mesh performance (lighted): > 1 triangles per mesh: 189393 triangles per second [stuff deleted] > 30 triangles per mesh: 675648 triangles per second > 62 triangles per mesh: 714240 triangles per second > Display listed triangle mesh (lighted): > 62 triangles per mesh: 769181 triangles per second > Display listed triangle mesh (colored): > 62 triangles per mesh: 1020342 triangles per second I find this interesting. Apparently, the way to max out the VGX is to use display lists. I thought SGI considered display lists "naughty". Several times on comp.graphics, SGI folks have bashed display-list oriented techniques and the company's position paper on "PEX & PHIGS" states over and over the advantages of immediate mode over display-list techniques. I find it particularly ironic then that the 1 M p/s number comes from display-list techniques. Another poster asked about how things change when lights are turned on, etc. I think Kurt's table (along with examining the source) answers this question. Naturally, the more lights are turned on, the slower things get (can't compute everything instantaneously). Also, I notice that these polygons aren't depth cued, which would also reduce the numbers somewhat (naturally, as stated they are PEAK numbers). > Note that performances of well over 1 million triangles per second are > achieved for long meshes of single- and multi-colored triangles, with > the zbuffer enabled. When lighting and smooth shading are enabled, the > performance drops to roughly 3/4 of a million triangles per second. I notice that the zbuffer was enabled, but that the Z test was set to ZF_ALWAYS. I can imagine a good microcoder optimizing that case so as to not perform the read-modify-write cycle to the Z buffer (since the test will always win anyway). Is a r-m-w cycle taking place, or is it just being written through? Thanks again Kurt for clarifying these mysteries! -- Rich Rich Thomson thomson@cs.utah.edu {bellcore,hplabs,uunet}!utah-cs!thomson ``Read my MIPs -- no new VAXes!!'' --George Bush after sniffing freon