Path: utzoo!dptcdc!jarvis.csri.toronto.edu!mailrus!uflorida!novavax!hcx1!hcx2!bill
From: bill@hcx2.SSD.HARRIS.COM
Newsgroups: comp.lang.fortran
Subject: Re: yes vs. no on f8x
Message-ID: <44400036@hcx2>
Date: 17 Apr 89 17:31:00 GMT
References: <24130@beta.lanl.gov>
Lines: 72
Nf-ID: #R:beta.lanl.gov:24130:hcx2:44400036:000:4203
Nf-From: hcx2.SSD.HARRIS.COM!bill    Apr 17 13:31:00 1989


> It is precisely this hard information on what the new standard will
> require compilers to do that is missing in this newsgroups discussions.
> Are there any compiler writers out there that want to enlighten us?

I would love to.  I have been trying to do so for the past year.  Either I
have failed miserably, or you haven't been reading this newsgroup that
long.  :-) Unfortunately, most of my attempts to "enlighten" people about
what FORTRAN/8x will do to compilers has been "misinterpreted", perhaps
partly because I work for a (dare I say it?), a VENDOR!  And, as everyone
knows, vendors are always against progress and innovation and children and
Mom and apple pie and patriotism and anything else that's good and decent.

Anyway, I think your example of the array notation is an excellent place to
start.  Probably, users of supercomputers and minisupercomputers will see
very little difference between using the array notation or using DO loops.
Perhaps their compiles will go faster, as the compiler has less work to do,
but they probably already go so fast as to appear instantaneous.

However, users of scalar machines, particularly small scalar machines, will
see a HUGE difference.  Their compiles will probably go much slower, or
their code will execute much slower, or both.  Now, before we go any
further, I have made an assumption: that both sets of users are dealing
with code that has been hand-optimized as much as possible, that the best
algorithms have been used, etc., and then those algorithms converted from
DO loops to the 8x array notation.

First, let's consider a relatively simple compiler, the kind you would
probably find on a PC or small workstation.  That compiler will probably
transform each array statement into one, possibly two, loops.  (I say two
because the 8x rules say that, in effect, the entire right-hand side of the
assignment is evaluated prior to altering the left-hand side.  Only in very
simple cases can a compiler trivially detect that this can be avoided
without harm; the other cases require more analysis than our example
compiler would typically want to do, given the horsepower and memory
available.)  Now if your original DO loop was only doing that one
statement, then you have lost nothing (yet).  But, judging from code I have
seen from customers, that is seldom true -- users typically put multiple,
independent operations in one DO loop to save on the loop overhead.  In
that case, you have traded one loop for n loops.

Furthermore, any temporary arrays required to evaluate the right-hand side
of the assignment will probably be allocated dynamically, since seldom will
the limits of the arrays involved be static.  That dynamic allocation costs
you, the user, in execution speed; and, there will probably be more
temporaries allocated than you perhaps think, again because the analysis
required to avoid it is just too expensive.

Now let's consider a somewhat smarter compiler, one that attempts to do
some optimization, but still far short of attempting the analysis required
by a vectorizer.  This is typically the type of compiler you would see on a
larger scalar machine, like a mini or supermini.  This compiler probably
will do better at detecting when the left and right-hand sides of an
assignment don't overlap, and when it can avoid allocating a temporary, but
it won't be perfect.  It might also do some loop unrolling of those loops
generated by the array statements, which will partially offset the penalty
of having traded one loop for many.  However, you'll still probably pay an
execution penalty, because that one original DO loop might easily fit in
your machine's instruction cache, but all those unrolled loops quite
probably won't.  You'll pay for this in more memory traffic and slower
execution speed.  But you'll also pay for the increased analysis required
of the compiler in slower compilation speed.

There are many other examples of how the array notation will adversely
impact users of scalar machines.  I hope this has been, at least,
educational to you.

Bill Leonard
Harris Computer Systems Division
2101 W. Cypress Creek Road
Fort Lauderdale, FL  33309
bill@ssd.harris.com or hcx1!bill@uunet.uu.net