Path: utzoo!attcan!uunet!husc6!bloom-beacon!bu-cs!purdue!i.cc.purdue.edu!k.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.arch Subject: Re: getting rid of branches Summary: You can do even better Message-ID: <822@l.cc.purdue.edu> Date: 2 Jul 88 11:16:28 GMT References: <1941@pt.cs.cmu.edu> <3208@ubc-cs.UUCP> <1986@pt.cs.cmu.edu> <853@garth.UUCP> Organization: Purdue University Statistics Department Lines: 34 In article <853@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes: | How do you move old |code (`dusty decks') onto a parallel processor? One way is to slice up |the program into independent pieces that can be combined again later. | . . . |you run two or three loops: | | 1 2 3 | | for i in [0..n-1) for i in [0..n-1) for i in [0..n-1) | A := compute A := compute A := compute | B[i] := fn1(A) B[i] := fn2(a) combine[i] := | test(A) | rof rof rof > > On a 205, if compute, test, and fn1 and fn2 are vectorisable, this entire > construct can be hand-vectorised by something like > compute' -> promoted A' > test(A') -> bit vector > fn1'(A') -> B[i] if bit[i] set (else discard and ignore faults) > fn2'(A') -> B[i] if bit[i] clear (else discard and ignore faults) > CFT is supposed to provide a vectorisable conditional expression. FTN200 code > to vectorise ifs may/may not be completed and released. Even better is to first construct vectors A1, which contains only the elements of A for which bit[i] set, and A0, which has the elements for which bit[i] is clear, compute fn1 on A1, fn2 only on A0, and use bit to merge the two result vectors. What was suggested could be vectorized on the CRAY or even on SIMD machines; the improvement can be so vectorized on those machines only with difficulty. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet