Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!decwrl!sgi!shinobu!odin!bam
From: bam@sgi.com (Brian McClendon)
Newsgroups: comp.sys.sgi
Subject: Re: Relative GL costs
Message-ID: <1991Apr5.014022.28620@odin.corp.sgi.com>
Date: 5 Apr 91 01:40:22 GMT
References: <9104041941.AA29344@ge-dab.GE.COM>
Sender: news@odin.corp.sgi.com (Net News)
Organization: Silicon Graphics, Inc.
Lines: 63

In article <9104041941.AA29344@ge-dab.GE.COM> "dwilliam@larry.ATL.GE.COM"@andrew.dnet.ge.com writes:
>"Howard C. Smith" <smith@nextone.niehs.nih.gov> writes:
>> 	Does anyone have numbers as to the relative cost of  
>> particular GL calls? (for each machine in the 4D series). Maybe all  
>> normalized as a percentage of gconfig (presumably the most  
>> expensive). 
>>
>> 	Howard Smith
>> 	smith@nextone.niehs.nih.gov
>> 
> 
>/* 
> * this might be what you are looking for.
> * let me know if you make any interesting enhancements.
> * compile with:
> *    	cc -prototypes -acpp -O -s glbench.c -lm -lgl_s -lc_s -o glbench
> *
> * dan (dwilliams@atl.ge.com)
> *
> * GL benchmarking results sorted numerically for a 210GTX:
> * 
> * swapbuffers                         :      61 calls per second


It's hard to derive a true cost for a GL routine when it involves
the hardware gfx pipeline.  Because the bottleneck can be deep in the
pipe and lots of FIFO-ing inbetween, pixie/prof results _can_ be
very misleading.

If you write a benchmark prg (like glbench.c) and run the same
primitive over and over, then you _should_ get a reasonable idea
of the cost of a particular primitive (as long as you do a finish()
to flush the pipe or do enough iterations that the depth of the pipe is 
insignificant).  

Unfortunately there are exceptions to the above.  Swapbuffers & gsync
wait for the next vertical retrace, so benchmarking them is difficult.
I do know they each make a system call, but the whole routine shouln't
take more than 100 usecs itself (leaving you 16.56... msec to draw at a
60hz framerate).

Also, benchmarking mapcolor on some machines is difficult due to the
way mapcolor was microcoded. Here are some real numbers for mapcolor
performance.

VGX: 	31750 slots/sec
GTX: 	7400
G: 	2200
PI:	4000       

The problem with these is that their inverse is _not_ the cost of
the routine on most machines because when inserted in a stream of
unrelated cmds (that happen not to tickle the same bit of hardware)
the cost may drop down to a usec or less.

On a dumb frame buffer most of this would be very easy because there
is only one processor, but on the VGX there can be 11, some in parallel,
some in series.

--
----------------------------------------------------------------------------
 Brian McClendon bam@rudedog.SGI.COM ...!uunet!sgi!rudedog!bam 415-335-1110
----------------------------------------------------------------------------