Path: utzoo!attcan!uunet!oddjob!uwvax!rutgers!ucsd!ames!claris!apple!voder!pyramid!prls!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: NSIEVE C Source File (long) Message-ID: <2899@winchester.mips.COM> Date: 25 Aug 88 01:50:51 GMT References: <1070@marlin.NOSC.MIL> <1073@marlin.NOSC.MIL> Reply-To: mash@winchester.UUCP (John Mashey) Distribution: comp.arch Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 161 In article <1073@marlin.NOSC.MIL> aburto@marlin.nosc.mil.UUCP (Alfred A. Aburto) writes: >/****************** Start NSIEVE C Source Code *************************/ It is a good thing to publish benchmark sources so people can check themr, and Alfred should commended for doing so. Note, however, that this is not the most meaningful benchmark. See gory details below, including architectural analysis. >/************************************************************************/ >/* Some Results: */ >/* Array Size ------------------ BenchTime (sec) -------------------- */ >/* (Bytes) SUN 3/280 VAX 8600 Turbo-Amiga */ >/* 68020 @25 MHz 68020 @ 14.32 MHz */ >/* ( cc -O ) ( cc -O ) ( cc +2 +L +ff ) */ >/* 8191 0.267 0.250 0.461 ( 0.264 ) */ >/* 10000 0.300 0.283 0.578 ( 0.331 ) */ >/* 20000 0.650 0.783 1.195 ( 0.684 ) */ >/* 40000 1.333 1.767 2.383 ( 1.365 ) */ >/* 80000 2.917 3.750 4.820 ( 2.761 ) */ >/* 160000 7.833 8.033 9.758 ( 5.589 ) */ >/* 320000 17.600 17.717 ------ | */ >/* | */ >/* Scaled to 25 MHz --------------------+ */ >/* */ >/* Average Run Time relative to 10 Iterations and 8191 array size: */ >/* 0.315 0.345 0.484 */ A. Output from MIPS R2000 (in M/120: 16.7MHz, 128K cache, low-latency memory system) Sieve of Eratosthenes (scaled to 10 Iterations) Array Size Primes Last Prime BenchTime (Bytes) Found (Sec) 8191 1899 16381 0.130 10000 2261 19997 0.150 20000 4202 39989 0.320 40000 7836 79999 0.630 80000 14683 160001 1.270 160000 27607 319993 2.580 320000 52073 639997 5.570 Relative to 10 Iterations and the 8191 Array Size: Average BenchTime = 0.131 (sec) Now, some more data on this, or WHY YOU MUST BE CAREFUL WITH SMALL BENCHMARKS! B. First, architectural data: One can see (look for the (#) codes in the data below. (1) The instructions fit in a 32-word I-cache. (2) Essentially 100% of loads and stores are bytes. A more typical number might be 10%. (3) 75% of the loads+stores are stores. A more typical number is 30-35%. (4) 100% of the loads had NO useful work schedulable in the load-delay slot, i.e., in this case, a load is actually costing 2 cycles (+ any cache miss time). A more typical number if 30% (and this is very germane to using this benchmark on RISC machines with pipelining). (5) ~100% of the loads/stores have zero-offsets; a more typical number is 10-15%. [relevant to 29K, for example] pixstats nsieve: 159336572 (1.000) instructions 6392573 (0.040) loads 20105454 (0.126) stores 26498027 (0.166) loads+stores (2) 26498114 (0.166) data bus use 26484254 (0.166) partial word references (2) 33974425 (0.213) branches 0.999 load nops per load (4) 0.759 stores per memory reference (3) 0.999 partial word references per reference (2) Instruction concentration: 1 8.6% 2 17.2% 4 34.4% 8 55.1% 16 87.1% 32 100.0% (1) Load/store displacement 16.6% (5) (the size says how many bits used) size 0-extend 1-extend sign-extend 0 26482960 100.0% 100.0% 29 0.0% 0.0% 1 36 0.0% 100.0% 0 0.0% 0.0% 100.0% 2 0 0.0% 100.0% 29 0.0% 0.0% 100.0% 3 1912 0.0% 100.0% 8 0.0% 0.0% 100.0% 4 1654 0.0% 100.0% 0 0.0% 0.0% 100.0% 9 2 0.0% 100.0% 0 0.0% 0.0% 100.0% 10 18 0.0% 100.0% 0 0.0% 0.0% 100.0% 11 92 0.0% 100.0% 0 0.0% 0.0% 100.0% 12 602 0.0% 100.0% 0 0.0% 0.0% 100.0% 13 60 0.0% 100.0% 0 0.0% 0.0% 100.0% C. Now, what the program was doing: Profile listing generated Wed Aug 24 18:11:22 1988 with: ---------------------------------------------------------------------------- * -p[rocedures] using basic-block counts; * * sorted in descending order by the number of cycles executed in each * * procedure; unexecuted procedures are excluded * ---------------------------------------------------------------------------- 159336572 cycles cycles %cycles cum % cycles bytes procedure (file) /call /line 159286753 99.97 99.97 22755251 14 SIEVE (nsieve.c) 24351 0.02 99.98 43 29 _flsbuf (flsbuf.c) 16695 0.01 99.99 1392 18 _doprnt (doprnt.c) ...rest are in the noise, obviously..... ---------------------------------------------------------------------------- * -h[eavy] using basic-block counts; * * sorted in descending order by the number of cycles executed in each * * line; unexecuted lines are excluded * ---------------------------------------------------------------------------- (source code with line numbering is shown following) procedure (file) line bytes cycles % cum % SIEVE (nsieve.c) 241 12 41148840 25.83 25.83 SIEVE (nsieve.c) 240 8 27432560 17.22 43.04 SIEVE (nsieve.c) 231 20 25527710 16.02 59.06 SIEVE (nsieve.c) 235 16 25527640 16.02 75.08 SIEVE (nsieve.c) 244 16 25527640 16.02 91.11 SIEVE (nsieve.c) 230 4 6381910 4.01 95.11 SIEVE (nsieve.c) 238 16 4422440 2.78 97.89 SIEVE (nsieve.c) 237 8 2211220 1.39 99.27 SIEVE (nsieve.c) 242 4 1105610 0.69 99.97 228 for(i=0 ; i<=size ; i++) 229 { 230 *(flags+i) = true; 231 } 233 for(i=0 ; i<=size ; i++) 234 { 235 if(*(flags+i)) 236 { 237 prime = i + i + 3; 238 for(k = i + prime ; k<=size ; k+=prime) 239 { 240 *(flags+k)=false; 241 } 242 count++; 243 } 244 } D. Summary The statistics of this benchmark bear little resemblance to most substantial programs. This doesn't mean it's a bad benchmark, just that extreme care must be taken with it or any other small benchmark. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086