Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!yale!cmcl2!lanl!jlg
From: jlg@lanl.gov (Jim Giles)
Newsgroups: comp.lang.fortran
Subject: Re: Fortran vs. C for numerical work (SUMMARY)
Message-ID: <7216@lanl.gov>
Date: 28 Nov 90 20:25:55 GMT
References: <BURLEY.90Nov28084920@pogo.ai.mit.edu>
Organization: Los Alamos Natl Lab, Los Alamos, N.M.
Lines: 53

From article <BURLEY.90Nov28084920@pogo.ai.mit.edu>, by burley@pogo.ai.mit.edu (Craig Burley):
< [...]
< Hmm, could you explain to me how to strength-reduce multiplication to addition
< for random references to array elements within loops, as in
<
<       SUBROUTINE FOO(A)
<       REAL A(37,103)
<       INTEGER I,J
<       DO I,1=100
<          DO J,1=100
<             A(SOMEFUNC(I),OTHERFUNC(J)) = 0.
<          END DO
<       END DO
<       END
<
< where SOMEFUNC and OTHERFUNC may be external functions or statement functions
< with expressions sufficient to ensure that there is no linearity between their
< inputs (I and J) and their outputs (used as array indices)?

I don't understand what relevance this example has to the topic under
discussion.  The fact that this can't be strength reduced is
irrelevant to the question of whether (so-called) flat arrays are
more efficient than multiple indirections (pointer-to-pointer
arrays).  In this example, the flat array requires (in addition to
the function references) a multiply and two adds in the inner loop
(maybe another add as well, if the array is not zero based and not
static).  The pointer-to-pointer array version would require two
adds and an extra memory reference - memory is almost always slower
than int multiplies.

In the most common case (where the indices are _not_ sent to
functions), the strength reduced reference of a flat array costs a
single increment instruction in the inner loop; the pointer-to-pointer
implementation requires the same add (to get to the next element in
the array) AND it _may_ have to do an additional memory reference and
another add (if the inner loop is running 'against the grain' of the
array - with the outer pointer striding fastest: the flat array works
just as efficiently regardless of the order of the loops).

Finally, if the array has more than two dimensions, the number of
additional memory references in the inner loop _may_ go up with the
number of dimensions.  The flat array either gets strength reduced
to a single stride or (in the case you give) has as many extra mults
as the pointer-to-pointer version has extra memory traffic.  So, in
the _usual_ case (where the array is indexed directly by the loop
counters), pointer-to-pointer just breaks even IF the loops are nested
in the same order as the array - otherwise you always lose.  In your
_unusual_ case, the pointer-to-pointer version is still slower except
on those rare machines where memory references are cheaper than mults.
And this still ignores the extra memory required by the arrays of
pointers!

J. Giles