Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!usc!wuarchive!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.arch Subject: Re: massive parallelism, was CDC 6600 and TI ASC Message-ID: Date: 8 Mar 91 15:09:54 GMT References: <45252@ut-emx.uucp> <1991Mar7.215545.430@zoo.toronto.edu> <7491@mentor.cc.purdue.edu> Sender: usenet@ee.udel.edu Organization: College of Marine Studies, U. Del. Lines: 52 Nntp-Posting-Host: perelandra.cms.udel.edu In-reply-to: hrubin@pop.stat.purdue.edu's message of 8 Mar 91 13:47:05 GMT >> On 8 Mar 91 13:47:05 GMT, hrubin@pop.stat.purdue.edu (Herman Rubin) said: Herman> In article , lamson@el1.crd.ge.com (scott h lamson) writes: > Given this line of reasoning, how do you look at massive parallel ala > the connection machine? Should you think of the CM as a slow scalar > machine with a super fast very long vector processor? > or is this maybe the wrong way to look at the machine to start with. Herman> This is the wrong way to look at it, and there are huge Herman> problems with massively parallel processors handling long Herman> vectors. Here is a simple example which will be relatively Herman> poor on SIMD machines: there is a function to be computed on Herman> all arguments of a vector. There are different efficient Herman> algorithms to be used in different parts of the domain, but Herman> there is no common even moderately efficient algorithm. That is why Danny Hillis originally wanted to have more than one instruction stream on the Connection Machine. I believe that he told me that the original idea was for 4 instruction streams. This would require 2 bits for "instruction stream select" rather than the 1 bit that is currently used for "masking". This part of the overhead is negligible. The part that caused them to drop the idea was that the front-end VAXen had enough trouble generating *one* instruction stream fast enough to keep the machine busy --- it would have failed badly trying to generate 4 independent instruction streams. Of course, the other part of the story is that they saw no overwhelming reason to include this added complexity in the first line of the machines. Now that the CM-2 SIMD architecture has proven itself successful, and now that *much* faster front end machines are available, Thinking Machines, Inc, might be more receptive to user requests for this sort of functionality. The "multiple-instruction-stream" feature may or may not speed up a particular application. The best case is when the different algorithms all require the same amount of time, then one obtains a speedup equal to the lesser of the number of instruction streams or the number of algorithms used. One may quibble about the choice of 4 instruction streams. I think that this would handle most of the applications, though I might want 8 for some stuff I do (one "interior" instruction stream and 6 "boundary condition" instruction streams for the faces of a three-dimensional rectangular box). Once one gets much more complicated than a "few" instruction streams, the problem would probably map better onto a MIMD architecture. -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET