Path: utzoo!mnetor!uunet!lll-winken!lll-crg.llnl.gov!brooks
From: brooks@lll-crg.llnl.gov (Eugene D. Brooks III)
Newsgroups: comp.arch
Subject: Re: Memory bank conflicts
Message-ID: <4773@lll-winken.llnl.gov>
Date: 11 Mar 88 18:15:16 GMT
References: <7690@pur-ee.UUCP> <3300021@uiucdcsm> <4712@lll-winken.llnl.gov> <12514@sgi.SGI.COM>
Sender: usenet@lll-winken.llnl.gov
Reply-To: brooks@lll-crg.llnl.gov.UUCP (Eugene D. Brooks III)
Organization: Lawrence Livermore National Laboratory
Lines: 23

In article <12514@sgi.SGI.COM> bron@olympus.SGI.COM (Bron C. Nelson) writes:
>The issue regarding memory bank conflicts usually has to do (no
>surprise) with array acesses.  I seem to recall that someone did a
>study of array indexing (either LLNL or Cray I believe) and concluded
>that for their test cases, about 60% of array accesses had a stride
>of 1 (i.e. the code stepped sequentially through the array in memory
>order), about 20% had stride 2, and about 20% "other".  WARNING: this
>is off the top of my head; probably mis-remembered (the stride 2
My best informant indicates 80% stride 1, 20% "other".  For two
D arrays people actively pad to get stride 1 one way and an "odd" stride
the other so array refs in both dimension go at the full clip.

Of the 20% "other", about 3% is estimated to be random gather.  There
are heavily used codes, however, which use random gather at the 25%
level.  Its just that this might be one code out of 10 or 20. Random
gather "never" runs at the full clip on any machine due to "random
conflicts", and very few machines handle random gather without a "several
clock penalty" per vector element.  Some machines, the names left
unmentioned for my own personal protection, are quite effectively castrated
by random gather in performance terms (even though the random gather
is supported in hardware).

The data mentioned above are "estimates" from knowledgeable sources,
such detailed statistics are very difficult to obtain.