Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!bloom-beacon!oberon!cit-vax!ucla-cs!zen!ucbvax!hplabs!pyramid!prls!mips!mash
From: mash@mips.UUCP (John Mashey)
Newsgroups: comp.arch,comp.sys.misc,comp.unix.wizards
Subject: Re: Between a Sun-4 and a Cray-2
Message-ID: <616@winchester.UUCP>
Date: Sun, 23-Aug-87 00:13:29 EDT
Article-I.D.: winchest.616
Posted: Sun Aug 23 00:13:29 1987
Date-Received: Sun, 23-Aug-87 23:46:09 EDT
References: <7500@shemp.UCLA.EDU> <552@winchester.UUCP> <2866@phri.UUCP>
Reply-To: mash@winchester.UUCP (John Mashey)
Distribution: world
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 90
Xref: mnetor comp.arch:1880 comp.sys.misc:773 comp.unix.wizards:3849

In article <2866@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>	John send me that 40-pager ...  a few comments.  First off, is I have
>learned to take with a grain of salt *anything* a computer manufacturer
>says about any other manufacturer's products...
ALWAYS! healthy skepticism is a Good Thing.

>...The problem is that Sun claims 10 mips performance for the Sun-4, while
>MIPS insists it's more like 7 mips.  That's a 30% difference, which in my
>book just isn't that much for this type of calculation.
Besides the usual issue that single-metric performance metrics are bad,
and that mips-numbers only get used because everybody gets forced into it,
this comment raises the interesting issue of human perception of differences.
To look at the general case, assume that a vendor claims one thing,
and reality is something else (assuming the reality is something you can
measure, i.e., NOT something like unspecified mips).
Here are some interesting questions:
	1) Do you think of the difference as ((claim - real)/claim * 100) % ?
	(i.e., the percentage of the claim that is missing)
	2) Do you think of it as ((claim - real)/real * 100) % ?
	(i.e., the percentage overstatement)
	3) For your choice of 1), or 2), at what percentage do you think it's
	worth bothering thinking that X & Y are different?
	4) Do any of you ideas change as mips ratings go up?  For example,
	maybe it's sensible to distinguish between 3.5 and 4 mips (12-14%),
	but maybe not between 9.5 and 10 (5%), and it seems irrational to
	distinguish between 59.5 and 60.
Obviously, all of these are a matter of opinion.  What I'm curious about is
how human perception works when applied to computer performance
characterization.

>	Also, I don't like the performance graphs in the MIPS report.  The
>graphs have "Performance" on the Y-axis and "Benchmark Type" on the X-axis,
>with lines connecting the data points.  This is wrong, since it indicates
>that "Benchmark Type" is a continious function, and implies that you can
>interpolate between the data points to find values for intermediate types
>of benchmarks.  These graphs should have been bar graphs, or scatter plots,
>or some other type of plot which is more suitable to discrete data.  BTW,
>the graphs look like they were done using CricketGraph on a Mac.  Very
>pleasing to the eye, even if they are misleading. :-)

Actually, they're a mixture of Cricket, Excel, and MacDraft.
There was absolutely no intention of implying interpolattion between points.

The good thing about computer graphics is that you can redraw graphs quickly.
The bad thing about it is that you can redraw them OFTEN.  We tried numerous
different ways to present the data, including various flavors of bar graphs
and scatter plots, and the ones we ended up with, somewhat to my surprise,
were much easier to understand than the others.  Here is the reasoning:
1) The goal is to show performance of machines and benchmarks, with enough
of both together to see how they compare on those benchmarks,
especially to get overall "performance envelopes".
2) If you use the bar graph that shows one benchmark versus a bunch of
machines, you get a good view of that, but you get no idea of overall
comparisons.
3) If you use the bar graph that shows several benchmarks versus a bunch of
machines, the graph gets cluttered very quickly, and it is much
harder to see a-b comparisons.  The most complex graph had 42 data points,
and it is just incomprehensible if done this way.
4) If you use scatter plots, with no lines connecting them, but little
symbols instead, you again find it gets cluttered quickly.
I've seen people end up connecting the symbols to figure out what's
happening.
5) If you use "stem-and-leaf" charts (which I like, akin to the way
I've seen some 8700-780 comparisons done), you get a real good idea
of overall performance envelopes, but it's hard to do a-b comparisons
for specific benchmarks, and it's hard to do more than about 2 machines
at once without really baroque graphics.
6) The line chart makes a few things clear, if you're lucky:
	a) machine A is always faster than machine B, for the set of benchmarks
	(the lines never cross).
	b) machine A is usually faster than B, except on a few benchmarks,
	which are made obvious by line-crossings.
	c) Missing data points are obvious [they're not as obvious on
	most bargraphs].
Point 6c) is important: one of the easiest ways to be mislead by some
numbers you get is by having partial sets of data, and getting excited
about extreme cases, without looking at overall patterns.   For example,
if all I saw was Dhrystone and character-pushing, I'd sure think
a Cray was a slow machine. [Well, not slow, but not as fast as the price.]

For this whole area, I cannot recommend the following book highly
enough.  In my opinion, it's in a league with "The Elements of Style":

Edward R. Tufte, "The Visual Display of Quantitative Information",
Graphics Press, Cheshire, Connecticut, 1983.  About $35, worth more.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086