Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ames!haven!uflorida!novavax!hcx1!hcx2!bill
From: bill@hcx2.SSD.HARRIS.COM
Newsgroups: comp.lang.fortran
Subject: Re: FORTRAN 88
Message-ID: <44400029@hcx2>
Date: 1 Nov 88 14:13:00 GMT
References: <669@convex.UUCP>
Lines: 56
Nf-ID: #R:convex.UUCP:669:hcx2:44400029:000:2813
Nf-From: hcx2.SSD.HARRIS.COM!bill    Nov  1 09:13:00 1988


As regards compile time and the array language:

The gentleman who thinks that optimizers only do vectorization is just dead
wrong.  We (Harris) have a very good optimizer, yet we don't have a vector
machine at all.  YOU may not be concerned about compile speed, but our
customers certainly are.  Consider that our USERS (not us) came up with a
feature known as DATAPOOL, which is a variant of COMMON.  The difference is
that the location of variables within a DATAPOOL is not specified to the
compiler, but rather to a separate program; thus references to DATAPOOL
variables are resolved at link time, rather than by the compiler as an
offset into a COMMON block.  The chief advantage is that one can rearrange
the variables within the DATAPOOL without recompiling all the program units
that reference them; one merely relinks the program.  Our users invented
this because (re)compilation was too slow!

As for the array language, let me offer a concrete example.  Consider
the following F77 fragment:

      S = 0
      DO 10 I = 1,N
         DO 20 J = 1,M
            B(J,I) = C(I,J)
   20    CONTINUE
         K = I + 100
         S = S + A(K)
   10 CONTINUE

Now this code was obviously written by a clever programmer, as the
array A has no relationship to B and C except that it has the same
number of elements.  However, this programmer noticed that he could
save the outer loop overhead by putting the summation of A inside
the outer loop with the transposition of C into B.  Now consider the
equivalent F8x code:

B(1:M,1:N) = TRANSPOSE(C(1:N,1:M))
S = SUM(A(100:N+100))

Note that it is impossible for the _programmer_ to use the nice
array-language facility AND tell the compiler that it is okay to
jam two of the (implied) loops together.  The analysis required
of the compiler to figure this out (in the general case) is roughly
equivalent to that required to vectorize the original F77 code.
On a scalar machine, then, the F8x code will run slower; how much
depends on the ratio of the loop overhead to the actual computation.
(Clearly, in this example the loop overhead must have been significant,
or the programmer wouldn't have written the F77 code as he did.)

This example, while real enough, is not nearly as complicated as what one
typically encounters in real applications.  Just as most of the analysis
for vectorization involves making sure that the vectorized code gets the
same answer, the same is true for the "scalarization" of array-language
code.  The BIG difference is that vectorizers usually run on high-end
machines, whereas the "scalarizers" will be running on low to medium-sized
machines.  The hit on compile time will be much more noticeable, if the
analysis is done; if it isn't done, then the hit on execution speed will be
just as noticeable.  Either way, the user loses.