Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!lll-winken!vette!brooks
From: brooks@vette.llnl.gov (Eugene Brooks)
Newsgroups: comp.arch
Subject: Re: ATTACK OF KILLER MICROS
Message-ID: <36235@lll-winken.LLNL.GOV>
Date: 19 Oct 89 20:02:40 GMT
References: <35825@lll-winken.LLNL.GOV> <1081@m3.mfci.UUCP> <35896@lll-winken.LLNL.GOV> <MCCALPIN.89Oct18103933@masig3.ocean.fsu.edu>
Sender: usenet@lll-winken.LLNL.GOV
Reply-To: brooks@maddog.llnl.gov (Eugene Brooks)
Organization: Lawrence Livermore National Laboratory
Lines: 58

In article <MCCALPIN.89Oct18103933@masig3.ocean.fsu.edu> mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) writes:
>I think that it is interesting that you expect the same users who
>can't vectorize their codes on the current vector machines to be able
>to figure out how to parallelize them on these scalable MIMD boxes.
I can only point out specific examples which I have experience with.
For certain Monte Carlo radiation transport codes, vectorization is a
very painful experience which involves much code rewriting to obtain
meager performance increases.  I have a direct experience with such
a vectorization effort on a "new" and not dusty deck code.  We got
a factor of 2 as the upperbound for performance increases from vectorization
on the XMP.  The problem was all the operations performed under masks.
LOTS of wasted cycles.  The same problem, however, was easily coded
in an EXPLICITLY PARALLEL language and obtained impressive speedups
of 24 out of 30 processors on a Sequent Symmetry.  It ran at 2.8 times
XMP performance on hardware costing much less.  We are moving on to
a 126 processor BBN Butterfly-II now which should deliver more than
40 times the performance of the XMP at similar system cost.

>It seems to me that the automatic parallelization problem is much
>worse than the automatic vectorization problem, so I think a software
>fix is unlikely....
Automatic vectorization is much easier than automatic parallelization
in a global sense.  This is why high quality vectorizing compilers
exist, in addition to the high availability of hardware, and why
automatic GLOBALLY parallizing compilers dont.  The problem with some
codes is that they must be globally parallelized, and right now an
expliticly parallel lingo is the way to get it done.

>In fact, I think I can say it much more strongly than that:
>Extrapolating from current experience with MIMD machines, I don't
>think that the fraction of users that can use a scalable MIMD
>architecture is likely to be big enough to support the economies of
>scale required to compete with Cray and their vector machines.  (At
>least for the next 5 years or so).  
I do not agree, LLNL (a really big user of traditional supercomputers)
has hatched the Massively Parallel Computing Initiative to achieve
this goal on a broad application scale within 3 years.  We will see
what happens...

>What is driving the flight from traditional supercomputers to
>high-performance micros is turnaround time on scalar codes.  From my
>experience, if the code is really not vectorizable, then it is
>probably not parallelizable either, and scalable machines won't scale.
Not true, I have several counter examples of highly parallel but scalar codes.

>The people who can vectorize their codes are still getting 100:1
>improvements going to supercomputers --- my code is over 500 times
>faster on an 8-cpu Cray Y/MP than on a 25 MHz R-3000/3010.  So the
>market for traditional supercomputers won't disappear, it will just be
>more limited than many optimists have predicted.
Yes, using all 8 cpus on the YMP and if each cpu is spending most of
its time doing 2 vector reads, a multiply and an add, and one vector
write, all chained up it will run circles around the current killer
micros which are tuned for scalar performance.  This situation will
change in the next few years.


brooks@maddog.llnl.gov, brooks@maddog.uucp