Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!sunybcs!boulder!grunwald From: grunwald@foobar.colorado.edu (Dirk Grunwald) Newsgroups: comp.arch Subject: Re: MYRIAS - yet again Message-ID: <14898@boulder.Colorado.EDU> Date: 14 Dec 89 21:22:49 GMT References: <13683@reed.UUCP> <515@ctycal.UUCP> <4218@amelia.nas.nasa.gov> Sender: news@boulder.Colorado.EDU Reply-To: grunwald@foobar.colorado.edu Organization: University of Colorado at Boulder Lines: 68 In-reply-to: serafini@amelia.nas.nasa.gov's message of 14 Dec 89 06:04:24 GMT DBS> the hardware since they're trying to build a programming paradigm that will be DBS> both easy to use and easy to port. They claim that converting old code takes DBS> hours or days instead of months. Basically anything that can be vectorized DBS> on a Cray can be parallelized on the Myrias. They downplay the issues of While it may be possible, I don't think it's practical. According to the talk myrias gave here ( we have one somewhere, see ealier note) there is no synchronization possible. Thus, you can't cheaply parallelize.. Do I = 2, N A(I) = B(I) * C(I) D(I) = A(I-1) * C(I) end On the Cray, this would be vectorized: A(2:N) = B(2:N) * C(2:N) D(2:N) = A(1:N-1) * C(2:N) On a machine with synchronization, you could say: Doall I = 2, N A(I) = B(I) * C(I) POST(A,I) WAIT(A,I-1) D(I) = A(I-1) * C(I) end or Doall I = 2,N A(I) = B(I) * C(I) end Doall I = 2,N D(I) = A(I-1) * C(I) end The myrias forces the latter, because of no synchronization. You could optimize this a little... S = (N-2)/Processors Doall IP = 1,S Do I = IP, IP + N - 1 A(I) = B(I) * C(I) if (I != IP ) D(I) = A(I-1) * C(I) end end Doall I = 1,S D(S * (N-2) ) = A((S * N-2)-1) * C((S*(N-2))) end (more or less -- you just strip mine the loop based on the number of processors, execute all first statements, and only the second statements that are local to your strip, merge pages and then assign all cross-process iterations) But you'll need to force a page merge betwen the two doall loops ( think they call them 'pardo' or something). It's not clear to me this that this going to be faster than e.g. a CM-2 or a Cray. For loops involving no cross-iteration dependence, however, it should work well. I belive this is what they had intended, by the way, because the designers (a physicist?) had several probelems with no cross iteration dependence.