Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!bloom-beacon!primerd!hollin!ds
From: ds@hollin.prime.com
Newsgroups: comp.arch
Subject: Re: parallel systems vs. uni-proces
Message-ID: <57300001@hollin>
Date: 24 Oct 89 23:26:00 GMT
References: <1667@l.cc.purdue.edu>
Lines: 44
Nf-ID: #R:l.cc.purdue.edu:-166700:hollin:57300001:000:2235
Nf-From: hollin.prime.com!ds    Oct 24 19:26:00 1989


/* Written  5:06 pm  Oct 24, 1989 by hsu@uicsrd.csrd.uiuc.edu in comp.arch */
In article <6655@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>
>Also, the job of programming a shared memory system is a lot harder than
>programming a system with messages as the communication medium. 

Umm, have you tried? 

When I took a parallel programming class a couple years ago, everybody
griped about the problems of coding Gaussian elimination for the
hypercube. Hardly anybody griped about implementing it for the
Sequent balance. (This single example of course doesn't prove
anything, but I would like to see some examples of problems that
are clearly easier to code for a message-passing system.)

Bill
/* End of text from hollin:comp.arch */

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

An example of a problem that is clearly easier to code in a message-passing
system than in a shared-memory MIMD system like the Sequent is ordinary array
(matrix) cross-product.  The Sequent offers no particular support, according
to its programming literature, other than doing unrelated cross-products
simultaneously on different 386s.  However, true data-parallel computers
(defined as computers that are powerful enough to simulate a systolic 2-d grid
efficiently) can implement the cross-product using an elegant linear-time
algorithm (measured using the word, not bit, model) that simply feeds each
array element into adjacent grid edges, one array into the rows and the other
into the columns, with a delay of one time unit for each succeeding row (and
the same for the columns).  Each processor does an add and a multiply.  Let me
know if you want to know the details; you can work it out for yourself easily
by examining the terms of a written-out symbolic cross-product.  (Furthermore,
this algorithm can also "pipeline" unrelated cross-products to reduce the
multiplicative overhead to close to 1, although this is a tiny effect compared
with the improvement from quadratic time to linear).

David Spector
Prime Computer, Inc.
ds@primerd.prime.com (until the layoff, soon)

(The above reflects my very scanty knowledge about parallel programming.
Please let me know if I've misunderstood something, and thanks...)