Path: utzoo!utgpu!watserv1!watmath!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!usc!snorkelwacker.mit.edu!ai-lab!life!burley From: burley@pogo.ai.mit.edu (Craig Burley) Newsgroups: comp.lang.fortran Subject: Re: strength reduction of array address calculations Message-ID: Date: 29 Nov 90 15:39:57 GMT References: <1990Nov28.213735.25563@murdoch.acc.Virginia.EDU> <1990Nov29.041958.2254@murdoch.acc.Virginia.EDU> Sender: news@ai.mit.edu Organization: /home/fsf/burley/.organization Lines: 252 In-reply-to: gl8f@astsun.astro.Virginia.EDU's message of 29 Nov 90 04:19:58 GMT In article <1990Nov29.041958.2254@murdoch.acc.Virginia.EDU> gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) writes: In article burley@pogo.ai.mit.edu (Craig Burley) writes: > I was responding (somewhat tongue in cheek, though hoping I'd >actually get an incredibly insightful answer) to someone who had said one can >strength-reduce the multiplies away in loops in response to yet another person >who clearly had been talking about, yes, RANDOM references. I was not claiming that I could strength-reduce random references. I was claiming that most references were in fact not random. >So, yes, I know (or am pretty sure) you can't. But I asked the >question anyway, because I wanted to know the poster's answer, or at >least suggest without being overly and publically rude that perhaps he >shouldn't have responded to a post about random references to arrays >saying they can be strength-reduced. Well, I think you've managed to be publically rude ;-) I see the smiley, but I do want to clarify: in no way did I consider Greg's posting, to which I originally responded, silly or anything. It just seemed to be a case of losing track of how the discussion progressed, as most of us do (ok, I know I do it). I did get a little annoyed at the private and public responses to my post talking about non-random references when clearly the particular subtopic had become random references as a subclass of array references in general: "You know, mammals are ..." (big discussion); "Well, that's usually the case, but if you have to deal with horses, they are an exception, because they are..."; "Yes, but most mammals aren't horses, so it really isn't important!" HUH? Sigh. We need an object-oriented newsgroup system (inheritance in particular) because it is really hard to keep track of all these threads of discussion, especially as topics are refined into subtopics without it being clear in postings about subtopics that they are intentionally being discussed as subtopics rather than supertopics. Who wants to trace back the discussion of every thread before replying? Oh well, dream on. >A common example of random accesses are scatter/gather operations. The array >element access computation in the following function in C: > > void foo(a,b,c) > float a[100][100]; > int b[100]; > int c[100]; > { > int i, j; > > for (i = 0; i < 100; ++i) > for (j = 0; j < 100; ++j) > a[b[i]][c[j]] = 3.1415*(i+j); > } This can be strength-reduced, although I would be hard pressed to see how the compiler could do it automagaically. If, in Fortran, you do the gather with each column getting one outer loop iteration, strength reduction is possible. It'd look something like this: DO I = 1, NUMCOLS ICOL = B(I) DO J = 1, NUMROWS(I) A( C(I,J), ICOL ) = 3.1415*(whatever) ENDDO ENDDO Since ICOL is constant across the inner loop, it can be strength reduced. I hope I haven't switched rows and columns, I can never remember which one's which by name although I know which index varies fastest :-) Hmm, first off, I should have specified b[(i+j)/2] in my sample program just to make things sufficiently "random" (and done the equivalent for the Fortran procedure). Next, I should have pointed out that while references to were random, the references to and weren't, so this particular example (some kind of scatter/gather thing, I don't write this kind of code so I'm no expert on what it's used for) still contains array references that are within the scope of discussion of C's pointer accesses vs. Fortran's array accesses (though as has been discussed, with one-dimensional arrays and optimizing compilers, there ain't much overall difference across all computing systems used today). Third, I think Greg's new version of the Fortran has "C(I,J)" where it should have "C(J)" -- but I may just be confused. Fourth, I break out in cold sweats whenever I think about writing anything down in Fortran vs. other languages regarding multi-dimensional arrays due to column-major vs. row-major ordering. But when it actually comes down to it, all I have to do is think about it in my favorite way (which I think is called row-major), and then reverse the dimensions when writing Fortran, and it works. I hope. Greg's example, aside from the third nit above, looks fine to me. Finally, this whole thread has been fascinating and highly informative for two reasons (yes, here's another numbered list -- that's what you get for reading the ramblings of a one-time tech writer): One, I'm saving some of the juicy stuff for review when GNU Fortran is working to serve as optimizations to check for and perhaps implement as time becomes available. Two, it makes me think about how both C and Fortran have fundamental omissions/comissions regarding arrays. C doesn't let you work with multi-dimensional arrays directly, you must express them using arrays of arrays (which is a different means of expression, even though it may or may not be more efficient for a specific problem) or by writing your own subscripting expressions for a single monolithic array (which allows the auto-reshaping-across-procedure-boundary often called for by Fortran programmers). Fortran doesn't let you say that when you are expecting an argument A to a procedure and say DIMENSION A(10,10), you really know that the array passed will itself be dimensioned that way, and that when passing an array to a procedure, it'll expect any optimization info on its known shape to be passed; the result being that, unless inter-procedural optimization is done, arrays must be passed as flat and must be received as flat, and pointer maps to speed up multi-dimensional array accesses can be used only entirely within the given procedure. I don't know how easy or reasonable it would be to add multi-dimensional arrays to C (I can see the newsgroup discussions now: "C is just as good as Fortran for scientific computing"; "No it isn't, C doesn't have true multi-dimensional arrays"; "Well, you can accomplish it using other means"; "That can be inefficient"; "Well, it could be added to C fairly easily"; "Ok, let's add it"; "Hey! Why are you making my favorite language bigger? I don't need this feature!"; "Sure you do, it's great for scientific computing"; "But I don't use C for scientific computing!"). But with Fortran there could be a means for declaring the interface to a procedure as requiring an array of a particular shape, and the definition of that procedure would codify that expectation, so all callers of the procedure would not only know that they had to pass an array of that shape, but that they could also pass any pointer maps that might be useful to access elements of the array optimally on a given architecture, and the procedure could assume the existence of those pointer maps. Gee, maybe that's what some of that stuff in the Fortran 90 spec was trying to do; I'll have to look at it again to see if it makes more sense to me in the light of this idea for a feature. (-: Ultimately I always believe the better language is one in which you can state at the highest level what you want to accomplish. C's requirement for either arrays of arrays or hand-coded subscript calculation (and I don't consider macros or inline functions to affect this line of reasoning) when you really want to simply specify a multi-dimensional array means it is not as good as Fortran in this respect. Fortran's self- imposed requirement that it be able to reshape arrays across procedure calls in all cases, by virtue of not allowing you to say "I know this array is indeed this shape in both the called procedure and in all its calling procedures", means it is not as good as C if you use C's array of arrays approach (where it is a given that the actual and formal arrays are the same shape, sort of). When you are able to say what you mean in a language, then it is better for maintainability, retargetability (in terms of other languages), portability (other machines), and optimization, or at least as good. An optimizing compiler might well be able to optimize multi-dimensional array accesses coded that way, but not coded as low-level expressions a la the approach one would use in C when representing a conceptual multi-dimensional array as a single-dimensioned array in the code. Some compilers might be able to optimize the low-level expressions as well by recognizing patterns. These don't have a real hard time with the higher-level constructs; for compilation speed, they can optimize them directly, or for internal elegance and verifiability they can "lower" them to the equivalent low-level expressions and let the powerful optimization at that level take care of the job. An example of this kind of lowering and letting powerful low-level optimization to good things is taking: DO 10 I=1,100 ... 10 CONTINUE and turning it into: I = 1 (code-visible iterator) TEMP1 = (100-I+1)/1 (# of trips) TEMP2 = 1 (increment value for iterator) T1 IF (TEMP1.LE.0) GOTO T2 (any trips left?) ... 10 CONTINUE I = I + TEMP2 (bump iterator for program) TEMP1 = TEMP1 - 1 (countdown trip) GO TO T1 (next iteration) T2 ... Then a general flow analyzer detects that T1:T2 is a loop and determines whether it fits into one of various models of loops with corresponding optimized implementations. This way, not only is the original DO loop shown optimized via this lowering, but code written like the lowered version also is transformed. Do people really write code like the lowered version? Apparently, when you do things like dead-code elimination, flow reordering, and other things, yes, you can end up with things that are really loops that didn't look that way at first. Useless-variable elimination via def-use chains, along with array element access elimination, not only eliminates some or all of the temporary variables created (either by replacing their values with machine instruction equivalents, like replacing I = I + TEMP2 with the machine's instruction for C's ++I or by at least overlaying the temp with other temps in separate loops) but may eliminate, for example, I as well (if the array references involving I can be strength-reduced into, say, pointers incremented each time the loop iterates, as one would do in C). And some machines loop faster when the loop ends on a value reaching 0 than reaching some other number, which is why TEMP1 is set up to do that by calculating the number of trips. (FYI: It is not valid standard Fortran for the program to change the value of a DO loop iterator while its DO loop is still active, so the above transformation safely ignores the possibility of a change to I during the loop. This is another reason Fortran code can sometimes be faster than "equivalent" C code; and, of course, if the programmer isn't aware of the restriction, violates it by modifying I during the loop and expecting the number of trips to be adjusted accordingly, and doesn't get a compile-time message, the result is buggy code that will work on some systems and not others -- hey, I'm gonna go add just that kind of check to GNU Fortran right now!) The same can be done for array references. Change: A(I,J) to: A[(I-lbound(A,1))+((J-lbound(A,2))*extent(A,1))] where lbound(x,y) is the lower bound of array x in dimension y, hbound (not mentioned above) is the same for the upper bound, and extent(x,y) is hbound(x,y)-lbound(x,y)+1. Then, things like strength reduction and loop induction are still doable, but might well apply to more cases than if restricted to explicit multi- dimensional array accesses. It is possible for an optimizer to strength-reduce too much, i.e. to the point where the code ends up less optimal than it might otherwise have been, on a machine where resource and instruction scheduling is important. It might be that the resulting intermediate code requires too many operations of the adder unit and yet leaves the multiplier unit twiddling its thumbs much of the time. To keep this from happening, one could use a backtracking mechanism directed to guess at a best goal for some code and repeatedly try various optimizations and schedule the results until that goal is achieved, and keep track of the best achievement in the meantime if it isn't -- this'll handle the problem by not keeping the completely strength-reduced version as the best achieved. But I lean towards going ahead and doing the strength-reduction, then providing information for the instruction and resource scheduler so that it can selectively move candidate operations off, in this case, a busy adder unit and onto an idle multiplier unit, and automatically generate any support code that kind of adjustment requires (and make sure things don't get worse in the process), because it's the scheduler that knows just where such movement is needed, not the code that decides whether to strength-reduce or "induct" an expression. But the whole proposition is RISCy... :-) -- James Craig Burley, Software Craftsperson burley@ai.mit.edu