Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!udel!nigel.ee.udel.edu!mccalpin
From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
Newsgroups: comp.lang.fortran
Subject: Re: vectorization question
Message-ID: <MCCALPIN.91Mar29112911@pereland.cms.udel.edu>
Date: 29 Mar 91 16:29:11 GMT
References: <1991Mar29.141313.7418@ariel.unm.edu>
Sender: usenet@ee.udel.edu
Organization: College of Marine Studies, U. Del.
Lines: 36
Nntp-Posting-Host: perelandra.cms.udel.edu
In-reply-to: prentice@triton.unm.edu's message of 29 Mar 91 14:13:13 GMT

>>>>> On 29 Mar 91 14:13:13 GMT, prentice@triton.unm.edu (John Prentice) said:

John> Consider the following loop:

John>     do 30 k=1,kmax
John>       do 20 j=1,jmax
John> 	      do 10 i=1,imax
John> 		a(i,j,k)=...
John>  10     continue
John>  20   continue
John>  30 continue

John> On the Cray, only the inner most loop will vectorize.  

That is not strictly true.  CFT77 will automatically collapse all
three loops if the arrays are all dimensioned (imax,jmax,*).

Furthermore, the parallelizer will strip-mine the collapsed version of
the loop in this case, which will make for the lowest possible
overhead....

John> Does anyone have a suggestion for how to collapse this loop
John> while still using the three dimensional array?  [....]

No matter what you do you will need to know the leading dimensions of
the arrays.  If (imax,jmax) are not the leading dimensions, then you
can vectorize over the whole array anyway and use a mask.  Whether or
not this will run faster than the inner-loop-vectorized code depends
on too many factors to talk about in general, but you need to take
into acount the relative sizes of imax and IDIM (etc), the complexity
of the RHS of the assignment statement, the absolute sizes of imax,
jmax, IDIM, JDIM, and perhaps a few more things.....
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET