Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ames!haven!uflorida!novavax!hcx1!hcx2!bill From: bill@hcx2.SSD.HARRIS.COM Newsgroups: comp.lang.fortran Subject: Re: FORTRAN 88 Message-ID: <44400029@hcx2> Date: 1 Nov 88 14:13:00 GMT References: <669@convex.UUCP> Lines: 56 Nf-ID: #R:convex.UUCP:669:hcx2:44400029:000:2813 Nf-From: hcx2.SSD.HARRIS.COM!bill Nov 1 09:13:00 1988 As regards compile time and the array language: The gentleman who thinks that optimizers only do vectorization is just dead wrong. We (Harris) have a very good optimizer, yet we don't have a vector machine at all. YOU may not be concerned about compile speed, but our customers certainly are. Consider that our USERS (not us) came up with a feature known as DATAPOOL, which is a variant of COMMON. The difference is that the location of variables within a DATAPOOL is not specified to the compiler, but rather to a separate program; thus references to DATAPOOL variables are resolved at link time, rather than by the compiler as an offset into a COMMON block. The chief advantage is that one can rearrange the variables within the DATAPOOL without recompiling all the program units that reference them; one merely relinks the program. Our users invented this because (re)compilation was too slow! As for the array language, let me offer a concrete example. Consider the following F77 fragment: S = 0 DO 10 I = 1,N DO 20 J = 1,M B(J,I) = C(I,J) 20 CONTINUE K = I + 100 S = S + A(K) 10 CONTINUE Now this code was obviously written by a clever programmer, as the array A has no relationship to B and C except that it has the same number of elements. However, this programmer noticed that he could save the outer loop overhead by putting the summation of A inside the outer loop with the transposition of C into B. Now consider the equivalent F8x code: B(1:M,1:N) = TRANSPOSE(C(1:N,1:M)) S = SUM(A(100:N+100)) Note that it is impossible for the _programmer_ to use the nice array-language facility AND tell the compiler that it is okay to jam two of the (implied) loops together. The analysis required of the compiler to figure this out (in the general case) is roughly equivalent to that required to vectorize the original F77 code. On a scalar machine, then, the F8x code will run slower; how much depends on the ratio of the loop overhead to the actual computation. (Clearly, in this example the loop overhead must have been significant, or the programmer wouldn't have written the F77 code as he did.) This example, while real enough, is not nearly as complicated as what one typically encounters in real applications. Just as most of the analysis for vectorization involves making sure that the vectorized code gets the same answer, the same is true for the "scalarization" of array-language code. The BIG difference is that vectorizers usually run on high-end machines, whereas the "scalarizers" will be running on low to medium-sized machines. The hit on compile time will be much more noticeable, if the analysis is done; if it isn't done, then the hit on execution speed will be just as noticeable. Either way, the user loses.