Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.lang.fortran Subject: Re: vectorization question Message-ID: Date: 29 Mar 91 16:29:11 GMT References: <1991Mar29.141313.7418@ariel.unm.edu> Sender: usenet@ee.udel.edu Organization: College of Marine Studies, U. Del. Lines: 36 Nntp-Posting-Host: perelandra.cms.udel.edu In-reply-to: prentice@triton.unm.edu's message of 29 Mar 91 14:13:13 GMT >>>>> On 29 Mar 91 14:13:13 GMT, prentice@triton.unm.edu (John Prentice) said: John> Consider the following loop: John> do 30 k=1,kmax John> do 20 j=1,jmax John> do 10 i=1,imax John> a(i,j,k)=... John> 10 continue John> 20 continue John> 30 continue John> On the Cray, only the inner most loop will vectorize. That is not strictly true. CFT77 will automatically collapse all three loops if the arrays are all dimensioned (imax,jmax,*). Furthermore, the parallelizer will strip-mine the collapsed version of the loop in this case, which will make for the lowest possible overhead.... John> Does anyone have a suggestion for how to collapse this loop John> while still using the three dimensional array? [....] No matter what you do you will need to know the leading dimensions of the arrays. If (imax,jmax) are not the leading dimensions, then you can vectorize over the whole array anyway and use a mask. Whether or not this will run faster than the inner-loop-vectorized code depends on too many factors to talk about in general, but you need to take into acount the relative sizes of imax and IDIM (etc), the complexity of the RHS of the assignment statement, the absolute sizes of imax, jmax, IDIM, JDIM, and perhaps a few more things..... -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET