Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!munnari.oz.au!labtam!graeme
From: graeme@labtam.labtam.oz (Graeme Gill)
Newsgroups: comp.benchmarks
Subject: Re: X-terminal benchmarks
Message-ID: <5572@labtam.labtam.oz>
Date: 14 Nov 90 03:16:52 GMT
References: <90315.122715HANK@BARILVM.BITNET> <43091@mips.mips.COM>
Organization: Labtam Information Systems Pty. Ltd., Melbourne, Australia
Lines: 84

In article <43091@mips.mips.COM>, rnovak@mips.com (Robert E. Novak) writes:
> Are the sources for the benchmarks available?
> 
	From the look of the tests xbench was probably used to produce
these figures. The other widely known test suit is x11perf. xbench as
it was distributed over the network has some problems. One major
one is that it does not disable or accept the no-expose events
generated by its copy area tests. This can cause strange results as
the host starts running out of memory trying to store all the events.
There are also other peculiarities, ie the invert rectangles test
only does one rectangle at a time, hence it is partially a measure
of network latency rather than pure invert area performance. This is
inconsistent with the other tests.
	x11perf is most useful for developmental work on servers,
although it is possible to use its results to draw conclusions about the
relative performance of an X server. I make use of x11perf extensively 
in verifying the results of my server optimisations. This often involves
modifying and/or adding tests to the x11perf suite.

> One of the base requirements for all SPEC benchmarks (as you may have
> guessed) is that the results of a program can be mechanically verified
> against known good results.

	This could be a difficult issue. Although the X specifications
specify exact pixelisation rules for most graphical operations, some
are deliberately relaxed - ie zero width lines - so that machine
dependent hardware can be used. Since zero width lines are widely used
by X applications, one cannot simply leave out these tests. The MIT X11R3
example server (which a number of vendors products are still based on) does
not even meet the pixelisation rules for some operations. Verification
would be almost impossible to do at the same time as speed testing,
since efficient use of the X protocol calls for doing as many graphics
operations as possible per packet, and reading back an image is a
relatively slow operation. If you really wanted to 'cook' the results of 
a server could save all the commands and only render them on receiving
a getimage request. The performance of all operations except getimage
would then seem exceptionally fast. Verification of the graphics
rendering is only part of the problem, as other commands would
also have to be verified - ie window creation, cursor operation,
exposures etc etc.

	It would certainly be very useful to have an X pixelisation
verification suite, but this seems to be a difficult project,
as the closest thing available for the MIT consortium is a partially
completed X protocol verification suite. If such a tool was available
then one could use it to verify the correct functioning of the
device under test, and then run the performance benchmarks,
but whether this is what you are looking for, I don't know.
	The other slight possibility would be to come up with a series
of operations that leave you with a (hopefully) unique pattern that
can then be verified.

	The current benchmarking tools only cover a small fraction
of the spectrum of drawing operations that may differ markedly
in speed on a particular X server. For instance, X allows the
16 boolean logical operations, but generally only fill and invert
are tested by benchmarks. Many servers will special case these
two ops as they are the ones used by applications the vast majority of
the time. The speed of textured fills will vary markedly with the
size of the texture pattern used, since servers may have 2 or 3 different
algorithms depending on whether a line of the pattern will fit in a register
or whether it is even, and therefore doesn't need bit shifting.
	Performance will vary widely depending on the number of
operations that can be grouped together (ie the size of the poly
fill rect request etc.), the frequency of sync commands etc.

	Benchmarking of X terminals is especially difficult since many
of the results will depend on the speed and exact implementation of the
host machines communication interface - ie Ethernet, TCP/IP etc.,
and how that interacts with the server communications.

	X servers can have notoriously uneven performance, so that
two applications that make different demands on the X server may
vary markedly in relative speed when running on different servers.

	In summary, trying to benchmark X servers in a fair way may
make CPU benchmarking look very simple. Considerable investigation
of the issues that may affect performance is needed. If perfect
verification of operation is needed, then benchmarking may not be
possible at all.

	Graeme Gill
	Labtam I.S. Pty. Ltd.
	graeme@labtam.oz.au