Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!lll-winken!vette!brooks From: brooks@vette.llnl.gov (Eugene Brooks) Newsgroups: comp.arch Subject: Re: ATTACK OF KILLER MICROS Message-ID: <36235@lll-winken.LLNL.GOV> Date: 19 Oct 89 20:02:40 GMT References: <35825@lll-winken.LLNL.GOV> <1081@m3.mfci.UUCP> <35896@lll-winken.LLNL.GOV> Sender: usenet@lll-winken.LLNL.GOV Reply-To: brooks@maddog.llnl.gov (Eugene Brooks) Organization: Lawrence Livermore National Laboratory Lines: 58 In article mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) writes: >I think that it is interesting that you expect the same users who >can't vectorize their codes on the current vector machines to be able >to figure out how to parallelize them on these scalable MIMD boxes. I can only point out specific examples which I have experience with. For certain Monte Carlo radiation transport codes, vectorization is a very painful experience which involves much code rewriting to obtain meager performance increases. I have a direct experience with such a vectorization effort on a "new" and not dusty deck code. We got a factor of 2 as the upperbound for performance increases from vectorization on the XMP. The problem was all the operations performed under masks. LOTS of wasted cycles. The same problem, however, was easily coded in an EXPLICITLY PARALLEL language and obtained impressive speedups of 24 out of 30 processors on a Sequent Symmetry. It ran at 2.8 times XMP performance on hardware costing much less. We are moving on to a 126 processor BBN Butterfly-II now which should deliver more than 40 times the performance of the XMP at similar system cost. >It seems to me that the automatic parallelization problem is much >worse than the automatic vectorization problem, so I think a software >fix is unlikely.... Automatic vectorization is much easier than automatic parallelization in a global sense. This is why high quality vectorizing compilers exist, in addition to the high availability of hardware, and why automatic GLOBALLY parallizing compilers dont. The problem with some codes is that they must be globally parallelized, and right now an expliticly parallel lingo is the way to get it done. >In fact, I think I can say it much more strongly than that: >Extrapolating from current experience with MIMD machines, I don't >think that the fraction of users that can use a scalable MIMD >architecture is likely to be big enough to support the economies of >scale required to compete with Cray and their vector machines. (At >least for the next 5 years or so). I do not agree, LLNL (a really big user of traditional supercomputers) has hatched the Massively Parallel Computing Initiative to achieve this goal on a broad application scale within 3 years. We will see what happens... >What is driving the flight from traditional supercomputers to >high-performance micros is turnaround time on scalar codes. From my >experience, if the code is really not vectorizable, then it is >probably not parallelizable either, and scalable machines won't scale. Not true, I have several counter examples of highly parallel but scalar codes. >The people who can vectorize their codes are still getting 100:1 >improvements going to supercomputers --- my code is over 500 times >faster on an 8-cpu Cray Y/MP than on a 25 MHz R-3000/3010. So the >market for traditional supercomputers won't disappear, it will just be >more limited than many optimists have predicted. Yes, using all 8 cpus on the YMP and if each cpu is spending most of its time doing 2 vector reads, a multiply and an add, and one vector write, all chained up it will run circles around the current killer micros which are tuned for scalar performance. This situation will change in the next few years. brooks@maddog.llnl.gov, brooks@maddog.uucp