Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!decwrl!sgi!shinobu!odin!bam From: bam@sgi.com (Brian McClendon) Newsgroups: comp.sys.sgi Subject: Re: Relative GL costs Message-ID: <1991Apr5.014022.28620@odin.corp.sgi.com> Date: 5 Apr 91 01:40:22 GMT References: <9104041941.AA29344@ge-dab.GE.COM> Sender: news@odin.corp.sgi.com (Net News) Organization: Silicon Graphics, Inc. Lines: 63 In article <9104041941.AA29344@ge-dab.GE.COM> "dwilliam@larry.ATL.GE.COM"@andrew.dnet.ge.com writes: >"Howard C. Smith" writes: >> Does anyone have numbers as to the relative cost of >> particular GL calls? (for each machine in the 4D series). Maybe all >> normalized as a percentage of gconfig (presumably the most >> expensive). >> >> Howard Smith >> smith@nextone.niehs.nih.gov >> > >/* > * this might be what you are looking for. > * let me know if you make any interesting enhancements. > * compile with: > * cc -prototypes -acpp -O -s glbench.c -lm -lgl_s -lc_s -o glbench > * > * dan (dwilliams@atl.ge.com) > * > * GL benchmarking results sorted numerically for a 210GTX: > * > * swapbuffers : 61 calls per second It's hard to derive a true cost for a GL routine when it involves the hardware gfx pipeline. Because the bottleneck can be deep in the pipe and lots of FIFO-ing inbetween, pixie/prof results _can_ be very misleading. If you write a benchmark prg (like glbench.c) and run the same primitive over and over, then you _should_ get a reasonable idea of the cost of a particular primitive (as long as you do a finish() to flush the pipe or do enough iterations that the depth of the pipe is insignificant). Unfortunately there are exceptions to the above. Swapbuffers & gsync wait for the next vertical retrace, so benchmarking them is difficult. I do know they each make a system call, but the whole routine shouln't take more than 100 usecs itself (leaving you 16.56... msec to draw at a 60hz framerate). Also, benchmarking mapcolor on some machines is difficult due to the way mapcolor was microcoded. Here are some real numbers for mapcolor performance. VGX: 31750 slots/sec GTX: 7400 G: 2200 PI: 4000 The problem with these is that their inverse is _not_ the cost of the routine on most machines because when inserted in a stream of unrelated cmds (that happen not to tickle the same bit of hardware) the cost may drop down to a usec or less. On a dumb frame buffer most of this would be very easy because there is only one processor, but on the VGX there can be 11, some in parallel, some in series. -- ---------------------------------------------------------------------------- Brian McClendon bam@rudedog.SGI.COM ...!uunet!sgi!rudedog!bam 415-335-1110 ----------------------------------------------------------------------------