Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!uakari.primate.wisc.edu!sdd.hp.com!spool.mu.edu!uunet!convex!kcollins From: kcollins@convex.com (Kirby L. Collins) Newsgroups: comp.arch Subject: Re: bobstone measurements Message-ID: Date: 23 May 91 17:51:06 GMT References: <1991May23.114733.7945@convex.com> <1991May23.134259.12957@convex.com> Sender: usenet@convex.com (news access account) Organization: CONVEX Computer Corporation, Richardson, Tx., USA Lines: 47 Nntp-Posting-Host: dhostwo.convex.com In posting results for the Convex C220, Marv neglected to mention that these results are SCALAR only, with vectorization and parallelization inhibited. In fact, the inner loop in this benchmark is quite amenable to vectorization and parallelization: Script started on Thu May 23 12:46:35 199 hurst [32]cc -ds -O3 -o bobstone bobstone.c Optimization by Loop for Routine main Line Iter. Reordering Optimizing / Special Exec. Num. Var. Transformation Transformation Mode ----------------------------------------------------------------------------- 13 i Scalar 16 loc PARA/VECTOR SVZ Line Iter. Analysis Num. Var. ----------------------------------------------------------------------------- 13 i Inner loop has induction value with varying base or step 16 loc Parallel outer strip mine loop hurst [33]uptime 12:47pm up 1 day, 19:38, 3 users, load average: 0.01, 0.35, 0.96 hurst [34]/bin/time bobstone Total time (sys+user) : 1.66 (bobstones) Page faults (min/maj) : 5/69 Blocks in input/output : 0/0 Context switches (vol/invol): 178/16 0.7 real 1.4 user 0.1 sys script done on Thu May 23 12:47:15 199 Note that the wall clock time is less than the CPU time, since the CPU cycles were distributed across multiple heads. Hurst is a C240, with four processors, and was lightly loaded at the time. The speedup from parallel execution was only a bit more than 2X, not uncommon for loops which are both vectorized and executed in parallel. The speedup would likely only approach 4X for much larger trip counts for the loc loop. Please note that the above is the result of exactly five minutes of compile-execute-analysis. Thus I fall into the same trap I often complain about...generating benchmark numbers without any meaningful analysis of the results 8-{. Kirby Collins Strategic Planner Convex Computer