Newsgroups: comp.benchmarks
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!hellgate.utah.edu!cs.utah.edu!thomson
From: thomson@cs.utah.edu (Rich Thomson)
Subject: VGX benchmark redux
Date: 9 Apr 91 15:46:16 MDT
Message-ID: <1991Apr9.154616.1976@hellgate.utah.edu>
Followup-To: comp.benchmarks
Keywords: VGX, GPC, PLB
Organization: Computer Science Department, University of Utah, SLC, UT
References: <1991Mar28.213128.9355@hellgate.utah.edu> <1991Apr1.154902.17858@odin.corp.sgi.com> 

In article <1991Mar28.213128.9355@hellgate.utah.edu>, I posted a VGX
benchmark program mailed to me by Brian McClendon of SGI.  I have
recently found the time to take a close look at this program and the
one posted by Kurt Akeley.

Although I have yet to try this particular program out on a VGX
machine, I will postpone that effort folr this particular program.  If
we examine the code of the program, we find that the polygons it is
attempting to display are created with the following loop:

#define SQRT3_2	(1.7321/2.0)
    /* initialize data arrays */
    for (i=0; i<(1 + NUMTRI/2); i++) {
	tribuf[i*8+0] = size*i;
	tribuf[i*8+1] = 0;
	tribuf[i*8+2] = 0;
	tribuf[i*8+4] = size*i + size/2;
	tribuf[i*8+5] = size*SQRT3_2;
	tribuf[i*8+6] = 0;
    }

    [...]

    bgntmesh();
    for(i=0;i<(1 + NUMTRI/2);i++)
    {
    	n3f(&normbuf[(i%2)*4]);
    	v3f(&tribuf[i*8]);
    	n3f(&normbuf[(i%4)*4]);
    	v3f(&tribuf[i*8 + 4]);
    }
    endtmesh();
    closeobj();

Notice that this creates a big, linear triangle strip that stretches
off the right side of the screen (especially if the triangles are the
50-pixel triangles quoted in the marketing literature).  This results
in most of the triangles being clipped from the view volume.

The program that Kurt Akeley posted in article
<1991Apr1.154902.17858@odin.corp.sgi.com> was much more reasonable, it
created a certain number of triangles per strip, with each strip being
linear, but with all the strips beginning at the same position
relative to the display window:

    /* initialize data arrays */
    for (i=0; i<MAXVERTEX; i+=1) {
	meshbuf[VERTSIZE*i+0] = (i&1) ? 0.0 : 1.0;
	meshbuf[VERTSIZE*i+1] = 0.0;
	meshbuf[VERTSIZE*i+2] = (i&1) ? 1.0 : 0.0;
	meshbuf[VERTSIZE*i+3] = 0;
	meshbuf[VERTSIZE*i+4] = 10.0 + (float)(size*(i>>1)) +
				       (float)(offset*(i&1));
	meshbuf[VERTSIZE*i+5] = 10.0 + (float)(size*(i&1));
	meshbuf[VERTSIZE*i+6] = 0.0;
	meshbuf[VERTSIZE*i+7] = 0;
    }

	[...]

#define LIGHTVERT(i) n3f(fp+(VERTSIZE*(i))); v3f(fp+(VERTSIZE*(i))+4)
    for (i=events; i>0; i--) {
	fp = meshbuf;
	bgntmesh();
	LIGHTVERT(0);
	LIGHTVERT(1);
	LIGHTVERT(2);
	endtmesh();
    }

Now on to some comments on Kurt's article:

> We take our graphics performance claims very seriously here at Silicon
> Graphics.

I'm sure you take them as seriously as MIPS, HP, and IBM take their
spec mark ratings.  Sadly the graphics community does not yet have the
equivalent of the specmark rating on which to intelligently compare
different platforms.  Just look at the claims made when comparing X
implementations.  The customer gets left in the lurch unless they
undertake analyzing the voluminous output of x11perf to find out the
real story.

I began to be skeptical when I saw the figure posted several times on
comp.graphics and queries to the poster responded with "its from out
marketing literature, I'll ask a ``tech type'' to send you a program"
(I never heard back from him).  Also, at a recent VGX demonstration at
the U, the rep couldn't tell me details about the figure, nor could he
show me a program with a high polygon rate.  He also didn't have any
models with several hundred thousand (say, 40% of the peak figure,
or 300K - 400K polygons) polygons, although he's a sharp enough man
that I imagine he WILL have them next time in case I'm there. ;-}

Hopefully, when the Graphics Performance Committee releases its
Picture Level Benchmark program (& numbers come forth from vendors)
this situation will be alleviated.  For now, we are stuck with
comparing performance numbers from each different vendor and
attempting to infer useful comparisons from widely differing measures.

For instance, you say:
> [quoted performance comes from] tuned programs that use ONLY
> commands that are available in the Graphics Library.

So these numbers are highly tuned for the architecture of the VGX and
are reproducible only with a vendor-specific library.  This is very
understandable, giving the position SGI holds in the 3D market, but it
is very difficult to compare different platforms with these kinds of
numbers in your hand.  [Perhaps that is the intention of the marketing
dept? ;-]

> I ran this program on my 5-span VGX with the following results:
>    size=8, offset=4, zbuffer(1), events=500000, lighting=1
>    running on cashew, GL4DVGX-4.0, Fri Mar 29 15:22:58 1991
>    Triangle mesh performance (lighted):
>       1 triangles per mesh: 189393 triangles per second
[stuff deleted]
>      30 triangles per mesh: 675648 triangles per second
>      62 triangles per mesh: 714240 triangles per second
>    Display listed triangle mesh (lighted):
>      62 triangles per mesh: 769181 triangles per second
>    Display listed triangle mesh (colored):
>      62 triangles per mesh: 1020342 triangles per second

I find this interesting.  Apparently, the way to max out the VGX is to
use display lists.  I thought SGI considered display lists "naughty".
Several times on comp.graphics, SGI folks have bashed display-list
oriented techniques and the company's position paper on "PEX & PHIGS"
states over and over the advantages of immediate mode over display-list
techniques.  I find it particularly ironic then that the 1 M p/s
number comes from display-list techniques.

Another poster asked about how things change when lights are turned
on, etc.  I think Kurt's table (along with examining the source)
answers this question.  Naturally, the more lights are turned on, the
slower things get (can't compute everything instantaneously).  Also, I
notice that these polygons aren't depth cued, which would also reduce
the numbers somewhat (naturally, as stated they are PEAK numbers).

> Note that performances of well over 1 million triangles per second are
> achieved for long meshes of single- and multi-colored triangles, with
> the zbuffer enabled.  When lighting and smooth shading are enabled, the
> performance drops to roughly 3/4 of a million triangles per second.

I notice that the zbuffer was enabled, but that the Z test was set to
ZF_ALWAYS.  I can imagine a good microcoder optimizing that case so as
to not perform the read-modify-write cycle to the Z buffer (since the
test will always win anyway).  Is a r-m-w cycle taking place, or is it
just being written through?

Thanks again Kurt for clarifying these mysteries!

						-- Rich
Rich Thomson	thomson@cs.utah.edu  {bellcore,hplabs,uunet}!utah-cs!thomson
    ``Read my MIPs -- no new VAXes!!''  --George Bush after sniffing freon