Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!bloom-beacon!primerd!hollin!ds From: ds@hollin.prime.com Newsgroups: comp.arch Subject: Re: parallel systems vs. uni-proces Message-ID: <57300001@hollin> Date: 24 Oct 89 23:26:00 GMT References: <1667@l.cc.purdue.edu> Lines: 44 Nf-ID: #R:l.cc.purdue.edu:-166700:hollin:57300001:000:2235 Nf-From: hollin.prime.com!ds Oct 24 19:26:00 1989 /* Written 5:06 pm Oct 24, 1989 by hsu@uicsrd.csrd.uiuc.edu in comp.arch */ In article <6655@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: > >Also, the job of programming a shared memory system is a lot harder than >programming a system with messages as the communication medium. Umm, have you tried? When I took a parallel programming class a couple years ago, everybody griped about the problems of coding Gaussian elimination for the hypercube. Hardly anybody griped about implementing it for the Sequent balance. (This single example of course doesn't prove anything, but I would like to see some examples of problems that are clearly easier to code for a message-passing system.) Bill /* End of text from hollin:comp.arch */ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ An example of a problem that is clearly easier to code in a message-passing system than in a shared-memory MIMD system like the Sequent is ordinary array (matrix) cross-product. The Sequent offers no particular support, according to its programming literature, other than doing unrelated cross-products simultaneously on different 386s. However, true data-parallel computers (defined as computers that are powerful enough to simulate a systolic 2-d grid efficiently) can implement the cross-product using an elegant linear-time algorithm (measured using the word, not bit, model) that simply feeds each array element into adjacent grid edges, one array into the rows and the other into the columns, with a delay of one time unit for each succeeding row (and the same for the columns). Each processor does an add and a multiply. Let me know if you want to know the details; you can work it out for yourself easily by examining the terms of a written-out symbolic cross-product. (Furthermore, this algorithm can also "pipeline" unrelated cross-products to reduce the multiplicative overhead to close to 1, although this is a tiny effect compared with the improvement from quadratic time to linear). David Spector Prime Computer, Inc. ds@primerd.prime.com (until the layoff, soon) (The above reflects my very scanty knowledge about parallel programming. Please let me know if I've misunderstood something, and thanks...)