Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!clyde.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!apple!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.benchmarks Subject: Re: bc benchmark [sigh] Message-ID: <44342@mips.mips.COM> Date: 26 Dec 90 06:37:57 GMT Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 152 References: Having finally caught up with the net after a long trip, I'm sad to see that 1 out of 3 postings in this newsgroup concern the "bc" benchmark or some variety of thereof. I had higher hopes for this, especially as at least some people have read previous discussions in comp.arch. This %#@!$% thing is like a vampire: every time you think you've finally put a stake thru its heart, it returns one more time. 1. Small benchmarks are very prone to misinterpretation, prone to compiler gimmickry, and seldome excercise modern machines very well. About their only even-slightly-rational use is to compare machines with the same chips running at different clock rates. Small, synthetic benchmarks can easily over- or under- emphasize language and/or machine features out of all proportion to mixtures found in more realistic benchmarks. As a matter of faith, I consider small benchmarks guilty until proven innocent, i.e., if you can prove their results correlate well, across product lines, with much more substantial real programs, then maybe you have something (and in fact, this is a good thing to have; for instance, I've often thought of offering a small prize for anyone who can create a small program that predicts performance on the 10 SPEC benchmarks across machine lines, but I haven't figured out how to describe this well enough to figure out if someone has achieved it.) 2. Filling the net with timings for a benchmark where no one even explains what code is being executed, how big it is, whether or not it correlates with ANYTHING, etc, etc, is like trying to predict the speed of automobiles by ripping out their steering wheels, and seeing how fast they roll. 3. NOW, here are SOME FACTS about this benchmark: 1) It is tiny: 99.57% of the instruction cycles (on a MIPS machine) are accounted for by 10 LINES OF CODE 71% of the cycles are consumed in 3 LINES OF CODE In addition, unlike matrix kernels, whose code is small, but whose data references are big, this doesn't even have that property: all the code & data fit in tiny caches. 2) Its instruction usage bears little resemblance to much of anything: see Hennessy & patterson for typical characteristics of code. In particular, this code almost never makes function calls, and ((on a MIPS machine, which HAS integer multiply and divide) spends 50% of the total cycles doing integer multiply and divide. I assure you, this is typical of very few programs; this is NOT the kind of statistics that any computer architect I know designs machines around, etc, etc. (Of course, I should love this benchmark, as it REALLY hurts machines with no integer multiply.) At the end of this posting are the slices of prof & pixstats output. 4. PLEASE STOP WASTING TIME WITH THIS BENCHMARK (Please, let this be the last stake in its heart :-) 5. ABOUT THE ONLY USEFUL THING I CAN THINK OF TO DO WITH THIS is for somebody to run this benchmark on many of the machines for which SPEC integer benchmarks exist, plot the two together, and compute a correlation for them; or even, pick any one of the SPEC integer benchmarks and do it for that. (Or pick some other realistic integer benchmark for which well-controlled results exist.) ---------- Profile listing generated Tue Dec 25 13:42:35 1990 with: prof -pixie dc * -p[rocedures] using basic-block counts; * * sorted in descending order by the number of cycles executed in each * * procedure; unexecuted procedures are excluded * 84303520 cycles cycles %cycles cum % cycles bytes procedure (file) /call /line 84058950 99.71 99.71 1827369 36 mult (dc.c) 132423 0.16 99.87 4905 37 div (dc.c) 31153 0.04 99.90 538 21 nalloc (dc.c) ..... OH GOOD: it spends 99.7% of its time in one function... IN FACT, going to the next level of detail, where we see the number of cycles spent in the statements that consumed the time, we discover that 83.7% of the instruction cycles are spent IN JUST 4 LINES OF C....: * -h[eavy] using basic-block counts; * * sorted in descending order by the number of cycles executed in each * * line; unexecuted lines are excluded * procedure (file) line bytes cycles % cum % mult (dc.c) 1097 100 22754044 26.99 26.99 mult (dc.c) 1094 96 20317562 24.10 51.09 mult (dc.c) 1093 68 16755620 19.88 70.97 mult (dc.c) 1095 36 10771470 12.78 83.74 mult (dc.c) 1098 40 8383670 9.94 93.69 mult (dc.c) 1096 16 4787320 5.68 99.37 mult (dc.c) 1084 80 83600 0.10 99.47 mult (dc.c) 1102 96 45066 0.05 99.52 mult (dc.c) 1087 68 41076 0.05 99.57 nalloc (dc.c) 1974 36 29529 0.04 99.60 div (dc.c) 665 144 24070 0.03 99.63 mult (dc.c) 1101 96 23606 0.03 99.66 div (dc.c) 657 124 22139 0.03 99.69 mult (dc.c) 1104 40 20630 0.02 99.71 ...... ------------ Following is an analysis of instruction usage, on MIPS R3000-based machine: pixstats dc: 174126742 (2.065) cycles (6.97s @ 25.0MHz) 84303520 (1.000) instructions [# instructions]] 1283 (0.000) calls [basicaally: never does function calls]] 28881440 (0.343) loads [a little high] 8458964 (0.100) stores 89823222 (1.065) multiply/divide interlock cycles (12/35 cycles) (amazingly high: 50% of the time in this code is doing integer multiply divide. Real programs do exist like this, but this is completely unrepresentative of the vast bulk of integer code....] 1.36e+05 cycles per call ... like I said: hardly ever does function calls 6.57e+04 instructions per call Instruction concentration: 1 1.4% 2 2.8% 4 5.7% 8 11.4% 16 22.7% 32 45.4% 64 90.8% 128 99.6% 256 99.8% 512 99.9% 1024 100.0% 2048 100.0% 3697 100.0% THIS SAYS: in a peerfect full-associative cache, 90.8% ofthe instruction cycles would be spent in only 64 words (64 instructions), and 99.9% would fit into 1024 words.... i.e., it fits into almost any machine's cache... opcode distribution: [dynamic]] div 2395317 2.84% multu 1197623 1.42% A PROGRAM WITH TWICE AS MANY INTEGER DIVIDES AS MULTIPLIES.... -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086