Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!texsun!convex!convex.COM From: patrick@convex.COM (Patrick F. McGehearty) Newsgroups: comp.benchmarks Subject: Re: Don't use bc (was: More issues of benchmarking) Message-ID: <109979@convex.convex.com> Date: 6 Dec 90 20:37:01 GMT References: <1990Dec3.204027.16794@cs.utk.edu> <109872@convex.convex.com> <1211@sunc.osc.edu> Sender: usenet@convex.com Reply-To: patrick@convex.COM (Patrick F. McGehearty) Organization: Convex Computer Corporation, Richardson, Tx. Lines: 78 In article <1211@sunc.osc.edu> djh@xipe.osc.edu (David Heisterberg) writes: >In article <109872@convex.convex.com> patrick@convex.COM (Patrick F. McGehearty) writes: >> program main >> real*8 a(256,256),b(256,256),c(256,256) >> call matmul(a,b,c,256) >>... >> do k = 1, n >> c(i,j) = c(i,j) + a(i,k)*b(k,j) >> enddo > >Won't Convex performance suffer here due to bank conflicts? At least on >a CRAY, the above algorithm would not (or should not) find a place in any >production code. The inner loop would be run over "i". Just goes to show you (and me) how easy it is to not measure what you think you are measuring. There are many valid matrix-multiply patterns, but different patterns give different performance levels, with some better on some machines than others. The one I listed was on the top of my memory because it tests several of the optimization features of the Convex fc compiler. Thus, I use it occasionally to be sure we don't lose these features. In particular, it tests "loop splitting" and "loop interchange". Thus, the optimized version of the loop (which is generated invisibly to the user) is: do i = 1,n ! split off to allow interchange do j = 1,n c(i,j) = 0.0 enddo enddo do k = 1,n do j = 1,n do i = 1,n ! interchanged to avoid bank conflicts c(i,j) = c(i,j) + a(i,k)*b(k,j) enddo enddo enddo This version would be better for testing compute capabilities on most vector machines than the original. The following version would be better for testing compute capabilities on some machines with weak optimiziers, but worse for vector machines: do i = 1,n do j = 1,n sum = 0.0 do k = 1,n sum = sum + a(i,k)*b(k,j) enddo c(i,j) = sum enddo enddo The weak optimizers repeatedly compute the address of c(i,j) in the first version, but not the second. Unfortunately, the scalar sum causes less efficient code to be generated for many vector machines. So the same problem (matrix multiply) can show either of two machines to be better depending on small details of the problem statement. Which reemphasizes the point that trivial benchmarks mislead the less than expert. Note that I careful do not say 'twit'. It is very easy to make errors in benchmark development and analysis. I have seen all sorts of errors made in these areas by people who are competent in their own specialties. Most Computer Science and Computer Engineering curriculms provide very little training in measurement methodology. I learned much more about measurement from my Physics and Experimental Psychology courses than I did from my CS training. Physics labs teach about experimental error, and Psychology teachs about experimental design. The CS I was exposed to focused on modeling and analysis, with some discussion of modelling errors. Given this lack of training in measurement, specialists in the field need to be aware of the naiviety of the users of our results. A benchmark should have some well-understood relation to the real purposes for which a machine is to be used in order to have value. If the machine is for program development, then measure compilation of non-trivial programs. If the machine is for numerical work, measure some non-trivial application kernels. If databases, then run some db ops. Etc. Any single number of performance without a definition of proposed workload is of little value. Brought to you by Super Global Mega Corp .com