Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!bloom-beacon!oberon!cit-vax!ucla-cs!zen!ucbvax!hplabs!pyramid!prls!mips!mash From: mash@mips.UUCP (John Mashey) Newsgroups: comp.arch,comp.sys.misc,comp.unix.wizards Subject: Re: Between a Sun-4 and a Cray-2 Message-ID: <616@winchester.UUCP> Date: Sun, 23-Aug-87 00:13:29 EDT Article-I.D.: winchest.616 Posted: Sun Aug 23 00:13:29 1987 Date-Received: Sun, 23-Aug-87 23:46:09 EDT References: <7500@shemp.UCLA.EDU> <552@winchester.UUCP> <2866@phri.UUCP> Reply-To: mash@winchester.UUCP (John Mashey) Distribution: world Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 90 Xref: mnetor comp.arch:1880 comp.sys.misc:773 comp.unix.wizards:3849 In article <2866@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: > John send me that 40-pager ... a few comments. First off, is I have >learned to take with a grain of salt *anything* a computer manufacturer >says about any other manufacturer's products... ALWAYS! healthy skepticism is a Good Thing. >...The problem is that Sun claims 10 mips performance for the Sun-4, while >MIPS insists it's more like 7 mips. That's a 30% difference, which in my >book just isn't that much for this type of calculation. Besides the usual issue that single-metric performance metrics are bad, and that mips-numbers only get used because everybody gets forced into it, this comment raises the interesting issue of human perception of differences. To look at the general case, assume that a vendor claims one thing, and reality is something else (assuming the reality is something you can measure, i.e., NOT something like unspecified mips). Here are some interesting questions: 1) Do you think of the difference as ((claim - real)/claim * 100) % ? (i.e., the percentage of the claim that is missing) 2) Do you think of it as ((claim - real)/real * 100) % ? (i.e., the percentage overstatement) 3) For your choice of 1), or 2), at what percentage do you think it's worth bothering thinking that X & Y are different? 4) Do any of you ideas change as mips ratings go up? For example, maybe it's sensible to distinguish between 3.5 and 4 mips (12-14%), but maybe not between 9.5 and 10 (5%), and it seems irrational to distinguish between 59.5 and 60. Obviously, all of these are a matter of opinion. What I'm curious about is how human perception works when applied to computer performance characterization. > Also, I don't like the performance graphs in the MIPS report. The >graphs have "Performance" on the Y-axis and "Benchmark Type" on the X-axis, >with lines connecting the data points. This is wrong, since it indicates >that "Benchmark Type" is a continious function, and implies that you can >interpolate between the data points to find values for intermediate types >of benchmarks. These graphs should have been bar graphs, or scatter plots, >or some other type of plot which is more suitable to discrete data. BTW, >the graphs look like they were done using CricketGraph on a Mac. Very >pleasing to the eye, even if they are misleading. :-) Actually, they're a mixture of Cricket, Excel, and MacDraft. There was absolutely no intention of implying interpolattion between points. The good thing about computer graphics is that you can redraw graphs quickly. The bad thing about it is that you can redraw them OFTEN. We tried numerous different ways to present the data, including various flavors of bar graphs and scatter plots, and the ones we ended up with, somewhat to my surprise, were much easier to understand than the others. Here is the reasoning: 1) The goal is to show performance of machines and benchmarks, with enough of both together to see how they compare on those benchmarks, especially to get overall "performance envelopes". 2) If you use the bar graph that shows one benchmark versus a bunch of machines, you get a good view of that, but you get no idea of overall comparisons. 3) If you use the bar graph that shows several benchmarks versus a bunch of machines, the graph gets cluttered very quickly, and it is much harder to see a-b comparisons. The most complex graph had 42 data points, and it is just incomprehensible if done this way. 4) If you use scatter plots, with no lines connecting them, but little symbols instead, you again find it gets cluttered quickly. I've seen people end up connecting the symbols to figure out what's happening. 5) If you use "stem-and-leaf" charts (which I like, akin to the way I've seen some 8700-780 comparisons done), you get a real good idea of overall performance envelopes, but it's hard to do a-b comparisons for specific benchmarks, and it's hard to do more than about 2 machines at once without really baroque graphics. 6) The line chart makes a few things clear, if you're lucky: a) machine A is always faster than machine B, for the set of benchmarks (the lines never cross). b) machine A is usually faster than B, except on a few benchmarks, which are made obvious by line-crossings. c) Missing data points are obvious [they're not as obvious on most bargraphs]. Point 6c) is important: one of the easiest ways to be mislead by some numbers you get is by having partial sets of data, and getting excited about extreme cases, without looking at overall patterns. For example, if all I saw was Dhrystone and character-pushing, I'd sure think a Cray was a slow machine. [Well, not slow, but not as fast as the price.] For this whole area, I cannot recommend the following book highly enough. In my opinion, it's in a league with "The Elements of Style": Edward R. Tufte, "The Visual Display of Quantitative Information", Graphics Press, Cheshire, Connecticut, 1983. About $35, worth more. -- -john mashey DISCLAIMER: UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086