Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!mcnc!ecsvax!urjlew
From: urjlew@ecsvax.UUCP (Rostyk Lewyckyj)
Newsgroups: comp.lang.fortran
Subject: Re: Parallelizing Techniques
Summary: corrections
Keywords: parallelizing, compiler, vectorizing
Message-ID: <6795@ecsvax.UUCP>
Date: 11 Apr 89 04:28:01 GMT
References: <17287@cisunx.UUCP> <6790@ecsvax.UUCP>
Organization: UNC Educational Computing Service
Lines: 47

> In article <17287@cisunx.UUCP>, dpl@cisunx.UUCP (David P. Lithgow) 
> Sr. Systems Analy./Pgmr., Univ. of Pittsburgh asked
> @ 
> @ 	 for references to Cray/VAX/FPS (and any other system)
> @ and their FORTRAN (or other, perhaps Ada) compilers' ability to detect
> @ opportunities for parallelism or vectorization inside and outside the compiler.
> @  ... 
> @ 	I know of the VAX/VMS PPL$ library routines, and I'd like to
> @ find a pointer or two to Cray Micro/Macro tasking, and other compilers'
> @ means of detecting parallelism (or vectorizable code).  
> @ --
In article <6790@ecsvax.UUCP>, I urjlew@ecsvax.UUCP (Rostyk Lewyckyj) wrote:
> 
> Hie thee to your local IBM representative and let him inform you
> about IBMs parallel FOrtran products for the 3090 supercomputers:
> compiler, debugging tools (PTOOL) etc., libraries (ESSL v3.) etc.
  
> Contact Rice University (Dr. Kemeny ?)
   Should have been - Contact Dr. Ken Kennedy (ken@rice.edu) on whose
work IBM's parallel compilers and tools are based.
  
>       ........ 
> statement across multiple processors economically, so the smallest
> granularity of parallelization is across do loops. i.e. there is
> nothing equivalent to CRAY micro/auto tasking.
 Actually CRAY micro/auto tasking are also parallelization across
DO loops just as on the IBM. I think that microtasking requires
specific compiler control statements inserted in the code and
autotasking is like a compiler switch. I don't know of any CRAY
Fortran language extensions for parallelization.
On CRAY YMPs and XMPs the hardware is capable of chaining together
operations of the vector processing units so that for a loop such as
   DO ... I=1,bigN
    D(I)=A(I)*B(I) + C(I)
    ......
the addition of the results of A(I)*B(I) to the C(I) is started
in the adder pipe before all the multiplications are out of the
multiply pile. This gives effective within statement parallelism
for even medium length vectors. I don't know how the details
of dependance checking are done. Perhaps the compiler analysis for 
vectorization is enough, and there are no further checks needed for
chaining
-----------------------------------------------
  Reply-To:  Rostyslaw Jarema Lewyckyj
             urjlew@ecsvax.UUCP ,  urjlew@tucc.bitnet
       or    urjlew@tucc.tucc.edu    (ARPA,SURA,NSF etc. internet)
       tel.  (919)-962-9107