Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!julius.cs.uiuc.edu!psuvax1!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstnd
From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: comp.lang.fortran
Subject: Re: Fortran vs. C for numerical work (SUMMARY)
Message-ID: <10444:Nov3008:15:4590@kramden.acf.nyu.edu>
Date: 30 Nov 90 08:15:45 GMT
References: <7097@lanl.gov> <17680:Nov2806:04:1090@kramden.acf.nyu.edu> <109378@convex.convex.com>
Organization: IR
Lines: 109

Here's a real example, which interested people can probably use for
benchmarks to settle this argument. But if you just care about the
theoretical point, skip to ``extra''.

Consider the following array code:

  for (j = 0;j < N;j++)
    a[i][j][k] *= y;

Here i and k were computed in some non-vectorizable way. i ranges from 0
through M - 1, and k ranges from 0 through P - 1.

With flat arrays in C (or Fortran) you might say instead

  for (j = 0;j < N;j++)
    a[(i * N + j) * P + k] *= y;

Any sane vectorizing compiler (e.g., Convex's) will recognize this as
the same code as above, even if the programmer sweeps through the array
with a pointer instead of j. It will convert both examples into vector
code like ``add (i * N * P + k) * sizeof(double) to the base of a, then
do a vector multiply by y starting there, stride P * sizeof(double),
ending N * P * sizeof(double).''

In C you can go a bit further. Set up an array of pointers to a[i*N*P]
for each i. Say double *(b[M]), with b[i] == &(a[i * N * P]). Now
a[i*N*P + j*P + k] is the same as b[i][j*P + k], so you can write

  for (j = 0;j < N;j++)
    b[i][j * P + k] *= y;

Now instead of adding (i * N * P + k) * sizeof(double) to the base of a,
the compiler will add i * sizeof(ptr) to the base of b, access that
value, and add k * sizeof(double) to that.

What's the difference? In both cases there's an addition of k, an
addition to an array base, and a multiplication by sizeof(double). In
the first case i is multiplied by N * P. In the second case i is
multiplied by the size of a pointer, and there's a memory access.

N * P can probably be precomputed and stored in a register. But there's
just as good a chance that i * sizeof(ptr) can be precomputed and stored
in a register, and in fact that it can be used in place of the register
holding i. (If neither of these is true, shifting i by 2 bits is faster
than multiplying two integer variables.)

So by using the extra pointers, b[i], we do one memory access instead of
a multiplication.

Is this an advantage? Certainly, on any machine where integer
multiplication is slower than memory accesses (including most RISC
architectures). Note that there are only a few extra pointers, and they
will almost certainly be kept in any memory cache available.

Does this advantage matter? Yes. It's true that most array accesses in
numerical programs are by step---like the N steps of j in the above
example. But even if N is large, a vector machine will run those steps
something like twenty times faster. If N is 500, N/20 is just 25. Many
of these imposing inner loops take less than 80% of a program's time on
such computers. Why ignore optimizations in the other 20%?

But in Fortran you can't do this optimization, because you can't store
that array of pointers. You can precompute the offsets, but you'll be
wasting at least an addition on each reference. If I were a religious
man like some others on USENET, I might conclude that Fortran is not
meant for numerical work.

In article <109378@convex.convex.com> patrick@convex.COM (Patrick F. McGehearty) writes:
> In article <17680:Nov2806:04:1090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >Fortran can't deal with a double-pointer array efficiently because it
> >doesn't have pointers. Simulating pointers efficiently is what I was
> >calling difficult. Do you disagree?
> Actually, simulating pointers in Fortran is not very hard, just a bit
> ugly.  First declare an array SPACE for all data that might be pointed
> to.  Then access it as you please.  For example: VAL = SPACE(IPTR(I)) 

But this is not efficient. Jim Giles has been claiming (incorrectly)
that a sufficiently smart compiler can always avoid the extra addition;
even if he were right, no current compiler can do the job in real
examples like this one.

In C, you can declare IPTR as pointers. You store elements in it as
pointers. You don't think about the addition because it's not there.
In Fortran this is impossible.

> That is, if we save the address of a(i,j) in a register, then
> computing the address of either a(i+1,j) or a(i,j+1) only takes
> a single addition, assuming a rectangular array.  The double-pointer
> array still takes two memory accesses and an addition.

This is yet another common misconception: that using pointers forces you
to throw away the advantages of contiguous arrays. Why are you making
such an assumption?

Using pointers to the top of each hyperplane in a multidimensional array
does not force you to use those pointers just to move sequentially
through the array.

> Also, on leading edge machines, a multiplication is as fast or faster
> than a memory reference, especially if you miss the cache.  As
> killer micros continue to increase their clock rates, this phenomena
> will spread.

What machines are you talking about? On leading-edge RISC architectures,
a multiplication is several times slower than a memory load, and in
practice you will never miss the b[] cache. And on current micros
multiplication is particularly slow.

---Dan