Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!texsun!convex!convex.COM
From: patrick@convex.COM (Patrick F. McGehearty)
Newsgroups: comp.benchmarks
Subject: Re: Don't use bc (was: More issues of benchmarking)
Message-ID: <109979@convex.convex.com>
Date: 6 Dec 90 20:37:01 GMT
References: <1990Dec3.204027.16794@cs.utk.edu> <109872@convex.convex.com> <1211@sunc.osc.edu>
Sender: usenet@convex.com
Reply-To: patrick@convex.COM (Patrick F. McGehearty)
Organization: Convex Computer Corporation, Richardson, Tx.
Lines: 78
In article <1211@sunc.osc.edu> djh@xipe.osc.edu (David Heisterberg) writes:
>In article <109872@convex.convex.com> patrick@convex.COM (Patrick F. McGehearty) writes:
>> program main
>> real*8 a(256,256),b(256,256),c(256,256)
>> call matmul(a,b,c,256)
>>...
>> do k = 1, n
>> c(i,j) = c(i,j) + a(i,k)*b(k,j)
>> enddo
>
>Won't Convex performance suffer here due to bank conflicts? At least on
>a CRAY, the above algorithm would not (or should not) find a place in any
>production code. The inner loop would be run over "i".
Just goes to show you (and me) how easy it is to not measure what you
think you are measuring. There are many valid matrix-multiply patterns,
but different patterns give different performance levels, with some
better on some machines than others. The one I listed was on the top
of my memory because it tests several of the optimization features
of the Convex fc compiler. Thus, I use it occasionally to be sure we don't
lose these features. In particular, it tests "loop splitting" and
"loop interchange". Thus, the optimized version of the loop (which is
generated invisibly to the user) is:
do i = 1,n ! split off to allow interchange
do j = 1,n
c(i,j) = 0.0
enddo
enddo
do k = 1,n
do j = 1,n
do i = 1,n ! interchanged to avoid bank conflicts
c(i,j) = c(i,j) + a(i,k)*b(k,j)
enddo
enddo
enddo
This version would be better for testing compute capabilities on most
vector machines than the original. The following version would be better
for testing compute capabilities on some machines with weak optimiziers,
but worse for vector machines:
do i = 1,n
do j = 1,n
sum = 0.0
do k = 1,n
sum = sum + a(i,k)*b(k,j)
enddo
c(i,j) = sum
enddo
enddo
The weak optimizers repeatedly compute the address of c(i,j) in the
first version, but not the second. Unfortunately, the scalar sum
causes less efficient code to be generated for many vector machines.
So the same problem (matrix multiply) can show either of two
machines to be better depending on small details of the problem statement.
Which reemphasizes the point that trivial benchmarks mislead the less than
expert. Note that I careful do not say 'twit'. It is very easy to make
errors in benchmark development and analysis. I have seen all sorts of
errors made in these areas by people who are competent in their own
specialties. Most Computer Science and Computer Engineering curriculms
provide very little training in measurement methodology. I learned much
more about measurement from my Physics and Experimental Psychology courses
than I did from my CS training. Physics labs teach about experimental
error, and Psychology teachs about experimental design. The CS I was
exposed to focused on modeling and analysis, with some discussion of
modelling errors. Given this lack of training in measurement, specialists
in the field need to be aware of the naiviety of the users of our results.
A benchmark should have some well-understood relation to the real purposes
for which a machine is to be used in order to have value. If the
machine is for program development, then measure compilation of non-trivial
programs. If the machine is for numerical work, measure some non-trivial
application kernels. If databases, then run some db ops. Etc.
Any single number of performance without a definition of proposed workload
is of little value.