Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!udel!nigel.ee.udel.edu!mccalpin
From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
Newsgroups: comp.arch
Subject: Re: massive parallelism, was  CDC 6600 and TI ASC
Message-ID: <MCCALPIN.91Mar8100954@pereland.cms.udel.edu>
Date: 8 Mar 91 15:09:54 GMT
References: <45252@ut-emx.uucp> <1991Mar7.215545.430@zoo.toronto.edu>
	<LAMSON.91Mar7181743@el1.crd.ge.com> <7491@mentor.cc.purdue.edu>
Sender: usenet@ee.udel.edu
Organization: College of Marine Studies, U. Del.
Lines: 52
Nntp-Posting-Host: perelandra.cms.udel.edu
In-reply-to: hrubin@pop.stat.purdue.edu's message of 8 Mar 91 13:47:05 GMT

>> On 8 Mar 91 13:47:05 GMT, hrubin@pop.stat.purdue.edu (Herman Rubin) said:

Herman> In article <LAMSON.91Mar7181743@el1.crd.ge.com>, lamson@el1.crd.ge.com (scott h lamson) writes:

> Given this line of reasoning, how do you look at massive parallel ala
> the connection machine?  Should you think of the CM as a slow scalar
> machine with a super fast very long vector processor? 
> or is this maybe the wrong way to look at the machine to start with.

Herman> This is the wrong way to look at it, and there are huge
Herman> problems with massively parallel processors handling long
Herman> vectors.  Here is a simple example which will be relatively
Herman> poor on SIMD machines: there is a function to be computed on
Herman> all arguments of a vector.  There are different efficient
Herman> algorithms to be used in different parts of the domain, but
Herman> there is no common even moderately efficient algorithm.

That is why Danny Hillis originally wanted to have more than one
instruction stream on the Connection Machine.  I believe that he told
me that the original idea was for 4 instruction streams.  This would
require 2 bits for "instruction stream select" rather than the 1 bit
that is currently used for "masking".  This part of the overhead is
negligible.  The part that caused them to drop the idea was that the
front-end VAXen had enough trouble generating *one* instruction stream
fast enough to keep the machine busy --- it would have failed badly
trying to generate 4 independent instruction streams.

Of course, the other part of the story is that they saw no
overwhelming reason to include this added complexity in the first line
of the machines.  Now that the CM-2 SIMD architecture has proven
itself successful, and now that *much* faster front end machines are
available, Thinking Machines, Inc, might be more receptive to user
requests for this sort of functionality.

The "multiple-instruction-stream" feature may or may not speed up a
particular application.  The best case is when the different
algorithms all require the same amount of time, then one obtains a
speedup equal to the lesser of the number of instruction streams or
the number of algorithms used.  

One may quibble about the choice of 4 instruction streams.  I think
that this would handle most of the applications, though I might want 8
for some stuff I do (one "interior" instruction stream and 6 "boundary
condition" instruction streams for the faces of a three-dimensional
rectangular box).

Once one gets much more complicated than a "few" instruction streams,
the problem would probably map better onto a MIMD architecture.
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET