Path: utzoo!attcan!uunet!ncrlnk!ncrcae!ece-csc!mcnc!xanth!nic.MR.NET!tank!uxc!uxc.cso.uiuc.edu!uxg.cso.uiuc.edu!uxe.cso.uiuc.edu!hirchert From: hirchert@uxe.cso.uiuc.edu Newsgroups: comp.lang.fortran Subject: Re: FORTRAN 88 Message-ID: <50500089@uxe.cso.uiuc.edu> Date: 4 Nov 88 17:42:00 GMT References: <669@convex.UUCP> Lines: 126 Nf-ID: #R:convex.UUCP:669:uxe.cso.uiuc.edu:50500089:000:6904 Nf-From: uxe.cso.uiuc.edu!hirchert Nov 4 11:42:00 1988 Bill Leonard(bill@hcx2.SSD.HARRIS.COM) writes about the cost of using the Fortran 8x array notation on a scalar processor. First, I would like to pick a couple of nits: >As for the array language, let me offer a concrete example. Consider >the following F77 fragment: > S = 0 > DO 10 I = 1,N > DO 20 J = 1,M > B(J,I) = C(I,J) > 20 CONTINUE > K = I + 100 > S = S + A(K) > 10 CONTINUE > Now consider the >equivalent F8x code: >B(1:M,1:N) = TRANSPOSE(C(1:N,1:M)) >S = SUM(A(100:N+100)) ^^^ not that it matters, but this should be 101. >(Clearly, in this example the loop overhead must have been significant, >or the programmer wouldn't have written the F77 code as he did.) On a number of scalar machines I have worked on, the separate loop version might actually be faster (due to locality of reference, instruction caches, and other such effects). Even on machines where the combined loop version is faster, the difference is rarely significant in absolute terms, so that its significance in relative terms is typically important if this computation in embedded in yet another loop so that a significant part of the overall program is spend doing this operation. My experience suggests that while there are a few programmers that would evaluate these kind of issues, most do not. The most likely reasons for the programmer to have combined the loops are . that the effective sizes of A, B, and C are related and the programmer wishes to insure that the size computed on is the same . that the programmer has been told it is more efficient to combine loops and does so without verifying that this is true for his program on his machine . that it takes one less DO statement and one less CONTINUE statement to write the combined loop The last reason (compact notation) may well be the most likely. >This example, while real enough, is not nearly as complicated as what one >typically encounters in real applications. If you mean that the most complicated statement in a typical application would be more complicated you may be right. If you mean that the typical statement in a typical application would be more complicated, I disagree; most applications consist primarily of a large number of very simple statements. > Just as most of the analysis >for vectorization involves making sure that the vectorized code gets the >same answer, the same is true for the "scalarization" of array-language >code. Here I am in total agreement. In fact, this will be true of register-based vector processors (such as those from Cray Research or Alliant) as well. The conversion between parallel notation and efficient sequential notation involves reordering of operations the must be analyzed for validity. For the most part, it doesn't matter which direction you are converting or how far, the basic analysis is the same. As I see it, array notation offers three benefits: . It is a more compact notation (and thus, for some people, a more convenient notation) for expressing many common operations on arrays. . Its basic semantics are more highly parallel (at the cost of higher temporary storage usage). In many cases, analysis can be done to determine the equivalence of this parallel notation and the traditional sequential notation. In cases where this analysis either cannot be done or where it is not done because of cost, the parallel notation will produce better performance on machines which support parallel or overlapped evaluation. (Originally, all the parallel machine were top of the line machines. Today, there are a number of "middle tier" machine with these properties and we are beginning to see add-on boards with these properties for machines in the workstation class. In other words, parallel and overlapped execution issues are affecting an ever increasing fraction of the Fortran community.) . The array notation puts under processor control a number of aspects of array computation where getting good performance is processor dependent. For example, when doing array operations on a two-dimensional array, there are a number of factors the go into the decision of which loop is the inner loop, including locality of reference, memory bank conflicts, and relative length of the two loops. The relative importance of these factors varies from processor to processor, so on one machine it may be best one way while on another machine it may be better the other way. Thus, a sequential statement of the operation will be optimal for only one of those machines. The parallel notation lets the processor choose the inner loop and thus be optimal one both machines. It also has costs: . It adds to the complexity of expression analysis, etc. . In cases where the parallel-sequential equivalency analysis can not or is not done, the parallel notation will produce worse performance on purely sequential machines. (Not necessarily significantly worse, but definitely worse) . In the added factors under the processor's control, if the analysis is done incorrectly (or not done), the wrong decision may be made, producing worse code than what the programmer could have written using the sequential notation. On some machines, the benefits outweigh the costs. On others, the costs outweigh the benefits. From the number of vendors already implementing the array notation, it is clear that the array notation is likely to be available on machines where the benefits outweigh the costs. Given that, I ask the following questions: . Do we want to standardize the notation available on such machines (to improve portability among such machines)? . Is it likely that people working on machines where the costs outweigh the benefits will nevertheless wish to import programs orginally written on machines where the array notation is available? Finally, I might note that even if the array notation is adopted as part of the Fortran standard, it would be possible to leave it out of your compiler if you provide a separate tool that converts the parallel notation to a sequential notation accepted by the compiler. (In this case, the "standard-conforming processor" would be the combination of the compiler and the tool.) On some machines this might be a beneficial implementation strategy: . When importing programs written using the array notation, the cost of analyzing that notation to sequentialize could be paid just once. . When writing codes on the machine, the user could decide whether or not to use the array notation by deciding for himself whether the notational benefits justified the cost of the extra analysis on each translation cycle. Kurt W. Hirchert hirchert@ncsa.uiuc.edu National Center for Supercomputing Applications