Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!mcnc!ecsvax!urjlew From: urjlew@ecsvax.UUCP (Rostyk Lewyckyj) Newsgroups: comp.lang.fortran Subject: Re: Parallelizing Techniques Summary: corrections Keywords: parallelizing, compiler, vectorizing Message-ID: <6795@ecsvax.UUCP> Date: 11 Apr 89 04:28:01 GMT References: <17287@cisunx.UUCP> <6790@ecsvax.UUCP> Organization: UNC Educational Computing Service Lines: 47 > In article <17287@cisunx.UUCP>, dpl@cisunx.UUCP (David P. Lithgow) > Sr. Systems Analy./Pgmr., Univ. of Pittsburgh asked > @ > @ for references to Cray/VAX/FPS (and any other system) > @ and their FORTRAN (or other, perhaps Ada) compilers' ability to detect > @ opportunities for parallelism or vectorization inside and outside the compiler. > @ ... > @ I know of the VAX/VMS PPL$ library routines, and I'd like to > @ find a pointer or two to Cray Micro/Macro tasking, and other compilers' > @ means of detecting parallelism (or vectorizable code). > @ -- In article <6790@ecsvax.UUCP>, I urjlew@ecsvax.UUCP (Rostyk Lewyckyj) wrote: > > Hie thee to your local IBM representative and let him inform you > about IBMs parallel FOrtran products for the 3090 supercomputers: > compiler, debugging tools (PTOOL) etc., libraries (ESSL v3.) etc. > Contact Rice University (Dr. Kemeny ?) Should have been - Contact Dr. Ken Kennedy (ken@rice.edu) on whose work IBM's parallel compilers and tools are based. > ........ > statement across multiple processors economically, so the smallest > granularity of parallelization is across do loops. i.e. there is > nothing equivalent to CRAY micro/auto tasking. Actually CRAY micro/auto tasking are also parallelization across DO loops just as on the IBM. I think that microtasking requires specific compiler control statements inserted in the code and autotasking is like a compiler switch. I don't know of any CRAY Fortran language extensions for parallelization. On CRAY YMPs and XMPs the hardware is capable of chaining together operations of the vector processing units so that for a loop such as DO ... I=1,bigN D(I)=A(I)*B(I) + C(I) ...... the addition of the results of A(I)*B(I) to the C(I) is started in the adder pipe before all the multiplications are out of the multiply pile. This gives effective within statement parallelism for even medium length vectors. I don't know how the details of dependance checking are done. Perhaps the compiler analysis for vectorization is enough, and there are no further checks needed for chaining ----------------------------------------------- Reply-To: Rostyslaw Jarema Lewyckyj urjlew@ecsvax.UUCP , urjlew@tucc.bitnet or urjlew@tucc.tucc.edu (ARPA,SURA,NSF etc. internet) tel. (919)-962-9107