Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!samsung!cs.utexas.edu!rice!paco From: paco@rice.edu (Paul Havlak) Newsgroups: comp.lang.fortran Subject: Re: vectorization question Message-ID: <1991Mar29.165126.11431@rice.edu> Date: 29 Mar 91 16:51:26 GMT References: <1991Mar29.141313.7418@ariel.unm.edu> Sender: news@rice.edu (News) Organization: Rice University Lines: 41 Originator: paco@miranda.rice.edu In article <1991Mar29.141313.7418@ariel.unm.edu>, prentice@triton.unm.edu (John Prentice) writes: |> Consider the following loop: |> |> do 30 k=1,kmax |> do 20 j=1,jmax |> do 10 i=1,imax |> a(i,j,k)=... |> 10 continue |> 20 continue |> 30 continue |> |> On the Cray, only the inner most loop will vectorize. |> ... If the Cray compiler really doesn't catch that case, you should complain loudly to Cray. Multi-dimensional vectorization is not much harder than single-dimensional (in this case, both are trivial). The PFC system at Rice does multi-dimensional vectorization, as does the commercially available KAP system from Kuck and Assoc. The failure of compilers to catch such simple cases drives programmers to try and trick the compiler (see the Perfect benchmark source for examples). Unfortunately, what tricks one compiler confuses others (even those that could have properly optimized the original code). So please, before uglifying your code for a compiler, complain! You may still have to rewrite the code, but they might eventually get the message. Good news about Fortran 90: You can write the above loop in triplet notation: a(1:imax,1:jmax,1:kmax) = ... Bad news about Fortran 90: If "a" is a formal parameter array (dummy arg), it might not be contiguous (assuming the implementation of array sections as parameters is not copy-in/copy-out). That inner loop will be hard to optimize if the stride between array elements is unknown. Interprocedural analysis will help, but will it be enough? Paul Havlak "I'd rather optimize Fortran than write it."