Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!julius.cs.uiuc.edu!psuvax1!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.lang.fortran Subject: Re: Fortran vs. C for numerical work (SUMMARY) Message-ID: <10444:Nov3008:15:4590@kramden.acf.nyu.edu> Date: 30 Nov 90 08:15:45 GMT References: <7097@lanl.gov> <17680:Nov2806:04:1090@kramden.acf.nyu.edu> <109378@convex.convex.com> Organization: IR Lines: 109 Here's a real example, which interested people can probably use for benchmarks to settle this argument. But if you just care about the theoretical point, skip to ``extra''. Consider the following array code: for (j = 0;j < N;j++) a[i][j][k] *= y; Here i and k were computed in some non-vectorizable way. i ranges from 0 through M - 1, and k ranges from 0 through P - 1. With flat arrays in C (or Fortran) you might say instead for (j = 0;j < N;j++) a[(i * N + j) * P + k] *= y; Any sane vectorizing compiler (e.g., Convex's) will recognize this as the same code as above, even if the programmer sweeps through the array with a pointer instead of j. It will convert both examples into vector code like ``add (i * N * P + k) * sizeof(double) to the base of a, then do a vector multiply by y starting there, stride P * sizeof(double), ending N * P * sizeof(double).'' In C you can go a bit further. Set up an array of pointers to a[i*N*P] for each i. Say double *(b[M]), with b[i] == &(a[i * N * P]). Now a[i*N*P + j*P + k] is the same as b[i][j*P + k], so you can write for (j = 0;j < N;j++) b[i][j * P + k] *= y; Now instead of adding (i * N * P + k) * sizeof(double) to the base of a, the compiler will add i * sizeof(ptr) to the base of b, access that value, and add k * sizeof(double) to that. What's the difference? In both cases there's an addition of k, an addition to an array base, and a multiplication by sizeof(double). In the first case i is multiplied by N * P. In the second case i is multiplied by the size of a pointer, and there's a memory access. N * P can probably be precomputed and stored in a register. But there's just as good a chance that i * sizeof(ptr) can be precomputed and stored in a register, and in fact that it can be used in place of the register holding i. (If neither of these is true, shifting i by 2 bits is faster than multiplying two integer variables.) So by using the extra pointers, b[i], we do one memory access instead of a multiplication. Is this an advantage? Certainly, on any machine where integer multiplication is slower than memory accesses (including most RISC architectures). Note that there are only a few extra pointers, and they will almost certainly be kept in any memory cache available. Does this advantage matter? Yes. It's true that most array accesses in numerical programs are by step---like the N steps of j in the above example. But even if N is large, a vector machine will run those steps something like twenty times faster. If N is 500, N/20 is just 25. Many of these imposing inner loops take less than 80% of a program's time on such computers. Why ignore optimizations in the other 20%? But in Fortran you can't do this optimization, because you can't store that array of pointers. You can precompute the offsets, but you'll be wasting at least an addition on each reference. If I were a religious man like some others on USENET, I might conclude that Fortran is not meant for numerical work. In article <109378@convex.convex.com> patrick@convex.COM (Patrick F. McGehearty) writes: > In article <17680:Nov2806:04:1090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >Fortran can't deal with a double-pointer array efficiently because it > >doesn't have pointers. Simulating pointers efficiently is what I was > >calling difficult. Do you disagree? > Actually, simulating pointers in Fortran is not very hard, just a bit > ugly. First declare an array SPACE for all data that might be pointed > to. Then access it as you please. For example: VAL = SPACE(IPTR(I)) But this is not efficient. Jim Giles has been claiming (incorrectly) that a sufficiently smart compiler can always avoid the extra addition; even if he were right, no current compiler can do the job in real examples like this one. In C, you can declare IPTR as pointers. You store elements in it as pointers. You don't think about the addition because it's not there. In Fortran this is impossible. > That is, if we save the address of a(i,j) in a register, then > computing the address of either a(i+1,j) or a(i,j+1) only takes > a single addition, assuming a rectangular array. The double-pointer > array still takes two memory accesses and an addition. This is yet another common misconception: that using pointers forces you to throw away the advantages of contiguous arrays. Why are you making such an assumption? Using pointers to the top of each hyperplane in a multidimensional array does not force you to use those pointers just to move sequentially through the array. > Also, on leading edge machines, a multiplication is as fast or faster > than a memory reference, especially if you miss the cache. As > killer micros continue to increase their clock rates, this phenomena > will spread. What machines are you talking about? On leading-edge RISC architectures, a multiplication is several times slower than a memory load, and in practice you will never miss the b[] cache. And on current micros multiplication is particularly slow. ---Dan