Path: utzoo!utgpu!water!watmath!clyde!rutgers!gatech!hubcap!ncrcae!ncr-sd!hp-sdd!hplabs!pyramid!prls!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Tasting of Dhrystone 2.0 Results
Message-ID: <1939@winchester.mips.COM>
Date: 27 Mar 88 20:07:09 GMT
References: <4076@vdsvax.steinmetz.ge.com> <3505@cbmvax.UUCP> <20970@bu-cs.BU.EDU>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 61

In article <20970@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:

>Wouldn't it be reasonable that, given a series of trials (perhaps by
>different people) that one reports only the best result (assuming
>there's no reason to believe it was due to a total error.)

>I agree that this isn't how one does things in the natural sciences,
>but that isn't what we're dealing with here at all. It seems that
>there are a zillion reasons a benchmark might be slowed down (sudden
>burst of net traffic etc) but I can't think of any good reasons that a
>benchmark, properly compiled and run, would accidently run fast.
>Perhaps getting lucky with a cache, but I don't think that's a concern
>or is meant to be eliminated by the dhrystone methodology....

Well, actually, cache luckiness happens, especially on benchmarks
whose size "roughly" approximates that of the cache(s).  [I say
"roughly" because it's much more complicated than that.]

In particular, suppose you have machines that have physically-tagged,
direct-mapped caches, which is what many microprocessors with TLBs
do, at least in building their external caches.
	Of the RISC machines:
		MIPS RX000: yes
		AMD 29000: likely design, if caches used
		SPARC: some might; Sun-4/2xx uses virtual map
		MC88000: no, uses set-associative cache chips
	Of the CISCs:
		80386: likely
		68020: sometimes
		68030: likely
		VAX 8700: yes

If physical->virtual mappings are made randomly by the OS,
there can be a wide variance in the performance of some benchmarks,
especially if a small joint (I+D) cache is used (split I & D caches
act somewhat more like 2-way set-associative ones).
In particular, you could get performance versus frequency
distributions like:
rel perf
(bigger=faster)	frequency
1.4		 5%
1.2		30
1.0		50
0.8		10%
0.7		 5%
The best case was in finding an unusuallly good cache arrangement,
and the worst an unusually bad one.

This can be fixed (statistically) by careful allocation of mappings,
the kernel tries to map any single program's virtual pages onto
the physical cache in a consistent way.  There are two results from
doing this:
	a) The variance lessens greatly.
	b) On some programs, the average performance actually improves.

This used to drive us nuts till we fixed it.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086