Path: utzoo!attcan!uunet!bu.edu!cs!art
From: art@cs.bu.edu (Al Thompson)
Newsgroups: comp.sys.super
Subject: Re: Cray tidbits
Message-ID: <58230@bu.edu.bu.edu>
Date: 4 Jun 90 16:52:23 GMT
References: <354@garth.UUCP> <1990May23.041119.4359@ux1.cso.uiuc.edu> <390@garth.UUCP>
Sender: news@bu.edu.bu.edu
Reply-To: art@cs.UUCP (Al Thompson)
Distribution: comp
Organization: /usr/lib/news/rn/organization
Lines: 97

In article <390@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes:
[...]
|
|   What areas?  Data parallelism, I think.   I am seeing deja-vu in the
|   acceptance of the scientific computing community to DP as I saw in the
|   acceptance to UNIX back in the mid-80's (cf. ETA discussion).  Those
|   entities that embrace and push DP fastest will be the winners, while
|   those that that continue with big iron vector boxes will be crushed
|   by the Killer Micros and go the way of NOS Cybers and AOS Novas.
|
|Data parallelism is an easy way to solve easy to parallelize problems.
|It is a poor way to solve hard to parallelize problems.  As example, I
|cite that the CM compilers do not run on the CM...  [Of course, most
|of the problems in the "scientific computing" community are 'easy', so
|that may not be an important point.  Linear algrabra exhibits good
|locality of reference and relative independence of calculations.]

That's the point, the "scientific problems" are indeed "easy".  I am
surprised at you compiler comment since compilers really don't fit the
model.  I realize that's your point, but it really raises the old general
purpose arguments.  Clearly the data parallel model is spectacular for
some applications.  I have been working with a CM for a while now, and
it's really quite a rush to ponder a problem for a while and then suddenly
discover it can be solved in one statement.  After a bit of experience it
is clear that there are new problem solutions that can be implemented.
Finding the implementations, or just searching for them, is quite
illuminating.

|
|   CRI has embraced DP in a big way in only the last few months, I think.
|   I have a pretty good idea, based totally on supposition, on what the
|   teraflop YMP will look like (hint: think CM.)
|
|I hope it isn't in the same big way they embraced "network
|supercomputing" --- claiming to have invented something which on
|analysis only means what other people call 'interconnectivity.'  If
|that's true, they'll introduce a PARDO construct to Fortran and call
|it data parallelism. (;-)

Isn't it interestng how this happens so often.

|
|   I'm not selling my CRI stock anytime soon.
|
|Me either. I can't get the money I put into it back, and I don't
|need the loss on my taxes. (:-(  [CRI is now trading ~47, or about
|half the average cost of CRI stock...]
|
|But seriously.  For data parallelism to be fully effective one needs a
|very high bandwidth low latency intercnonnect mechanism and problems
|which exhibit high locality of reference.  The power of the processing
|element isn't very important by comparison.  A large part of Thinking
|Machine's success with the SIMD approach in the connection machine as
|opposed to the price competitive MIMD approach of "hypercube" systems
|is the very clever and rather quick routing of the machine, coupled
|with the very low synchronization cost built into the lock step
|approach of SIMD. Hillis, in his PHD thesis, argues that the
|qualitative value of data parallelism only comes when a very large
|number of processing elements can be effectively utilized in parallel.

That is the point.  You really need a huge number of processors to get the
advantages.  Cursory cost analyses look like they kill the CM, and they do
too if your problem is small.  The reason for this is that the number of
processors in a factor in the cost equation.  In the case of the CM this
number is both fixed and large (usually).  So, if you have a problem that
doesn't really need a lot of processors then its cost seems prohibitive.

|
|CRI can build a data parallel machine in one of two ways.  It can
|build a medium number of processor MIMD machine with near zero
|synchronization cost and use the medium number of processors to model
|the large number of processors Hillis postulates.  Or, it can build a
|large number of processor systems.  In either case, doing this with a
|MIMD system is a very difficult technical problem because of the cost
|of synchronization.

You can say that again.  If you are not getting a bunch of processors you
are better off staying with super fast Von Neumann.  See Stone's article
on the search results reported by Thinking Machines.

|
|CRI could possibly build a SIMD implementation of the Y/MP; that is a
|Y/MP instruction set driven data parallel processor.  There are only
|three things needed to do this that they don't currently have the
|expertise for:
|
|1) hardware
|2) software
|3) marketing (;-)
|
|In fact, I can't even imagine such a machine running.  However, I've
|been wrong about enough things that I'll try instead to imagine how
|long it will take Crayless-Cray to produce the machine.

I can't imagine such a machine being SIMD.  The Cray instruction set, if
implemented on each processor, contains data dependent jumps.  The first
time one of these is executed, poof you're off in MIMD land.