Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!think!yale!cmcl2!lanl!lambda!jlg
From: jlg@lambda.UUCP (Jim Giles)
Newsgroups: comp.lang.misc
Subject: Re: Arrays in languages (was: Anyone want to design a language?)
Message-ID: <14256@lambda.UUCP>
Date: 28 Feb 90 20:11:26 GMT
References: <8849@boring.cwi.nl>
Lines: 58

From article <8849@boring.cwi.nl>, by dik@cwi.nl (Dik T. Winter):
> [...]
>  > ...
> (About generating equivalent code.)
> Again aliassing is a problem.  But the code I presented has no problems
> with aliasses unless used on multi-processor systems (only innerproducts
> are used, so vector processors have no problems, unless they are not good
> at that, and some are).

Actually, not true.  At least one form of optimization (loop unrolling)
is inhibited by possible aliasing in your example code.  Loop unrolling
can be effective in speeding up code on purely scalar machines.  A friend
of mine has an algorithm that runs 15% faster on an IBM/PC when the loops
are unrolled once.  In your example, the inner loop can be unrolled, but
not the outer one.

In addition, the vectorization you claim possible won't happen since the
operation in the inner loop is a reduction operation (it sums all the
elements of a vector into a single scalar).  Many vector machines don't
have reduction operators in their vector instruction set.  For those
that do, the instruction is usually nearly as slow as the equivalent
scalar loop.  The most efficient vector form of your algorithm is to
put the summation on the outer loop and vectorize that way.  But this
would be inhibited by aliasing in the C version.  (I reproduce the loops
here for anyone who is still trying to follow this argument.)

                for(i = l1m; i <= u1m; i++) {
                        sum = 0.0;
                        for(j = l2m; j <= u2m; j++) {
                                sum += M(i,j) * V(j);
                        }
                    W(i) = sum;
                }
The faster form is:
                for(i = l1m; i <= u1m; i++) W(i) = 0.0;
                for(j = l2m; j <= u2m; j++) {
                        for(i = l1m; i <= u1m; i++) {
                                W(i) += M(i,j) * V(j);
                        }
                }
Here, _both_ the add and the multiply vectorize.  But, C can't vectorize
this. 

> [... trying to detect aliasing at run-time ...]
> [...]  A simple case I encountered was in a long integer package.  The
> routine to subtract numbers 'SUB(A, B, C)' took two input arrays A and B
> and one output array C.  An attempt was made to detect whether A and B
> where the same.  The test failed on at least one system.

If A and B are input only, then aliasing in Fortran _IS_ allowed since
the subroutine doesn't do anything which would make aliasing cause problems.
Aliasing is only a problem if one or the other of the aliased names is
the target of an assignment.  I can see why they wanted to detect the
aliasing - to save time (return zero without actually having to do the
subtract).  However, failure to detect the aliased inputs will no change
the _MEANING_ of the code, only its efficiency.

J. Giles