Path: utzoo!utgpu!water!watmath!clyde!rutgers!gatech!hubcap!ncrcae!ncr-sd!hp-sdd!hplabs!pyramid!prls!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Tasting of Dhrystone 2.0 Results Message-ID: <1939@winchester.mips.COM> Date: 27 Mar 88 20:07:09 GMT References: <4076@vdsvax.steinmetz.ge.com> <3505@cbmvax.UUCP> <20970@bu-cs.BU.EDU> Reply-To: mash@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 61 In article <20970@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >Wouldn't it be reasonable that, given a series of trials (perhaps by >different people) that one reports only the best result (assuming >there's no reason to believe it was due to a total error.) >I agree that this isn't how one does things in the natural sciences, >but that isn't what we're dealing with here at all. It seems that >there are a zillion reasons a benchmark might be slowed down (sudden >burst of net traffic etc) but I can't think of any good reasons that a >benchmark, properly compiled and run, would accidently run fast. >Perhaps getting lucky with a cache, but I don't think that's a concern >or is meant to be eliminated by the dhrystone methodology.... Well, actually, cache luckiness happens, especially on benchmarks whose size "roughly" approximates that of the cache(s). [I say "roughly" because it's much more complicated than that.] In particular, suppose you have machines that have physically-tagged, direct-mapped caches, which is what many microprocessors with TLBs do, at least in building their external caches. Of the RISC machines: MIPS RX000: yes AMD 29000: likely design, if caches used SPARC: some might; Sun-4/2xx uses virtual map MC88000: no, uses set-associative cache chips Of the CISCs: 80386: likely 68020: sometimes 68030: likely VAX 8700: yes If physical->virtual mappings are made randomly by the OS, there can be a wide variance in the performance of some benchmarks, especially if a small joint (I+D) cache is used (split I & D caches act somewhat more like 2-way set-associative ones). In particular, you could get performance versus frequency distributions like: rel perf (bigger=faster) frequency 1.4 5% 1.2 30 1.0 50 0.8 10% 0.7 5% The best case was in finding an unusuallly good cache arrangement, and the worst an unusually bad one. This can be fixed (statistically) by careful allocation of mappings, the kernel tries to map any single program's virtual pages onto the physical cache in a consistent way. There are two results from doing this: a) The variance lessens greatly. b) On some programs, the average performance actually improves. This used to drive us nuts till we fixed it. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086