Path: utzoo!utgpu!water!watmath!clyde!burl!codas!uflorida!gatech!hubcap!Bjorn
From: lisper@YALE.ARPA (Bjorn Lisper)
Newsgroups: comp.hypercube
Subject: Re: bandwidth balance
Message-ID: <982@hubcap.UUCP>
Date: 16 Feb 88 20:45:35 GMT
References: <975@hubcap.UUCP>
Sender: fpst@hubcap.UUCP
Lines: 34
Approved: hypercube@hubcap.clemson.edu

[ An editorial reminder:  You cannot include more text than
  you add - the mailer checks that, not me.  If it looks simple,
  I'll occasionally fix it for you, but I don't if it takes
  editorializing.  Please help out.

	Steve
]

In article <975@hubcap.UUCP> rmw6x@hudson.acc.virginia.edu (Robert M. Wise)
writes:
>In article <964@hubcap.UUCP> Donald.Lindsay@K.GP.CS.CMU.EDU writes:
>>  ....
>>I recently heard a talk by Gil Weigand of Sandia National Labs. He claims
>>considerable success in getting near-linear scaleup on his NCUBE/10.
>....
>I suspect that there are a lot of algorithms which benefit from this
>approach, although not as much as the matrix multiplication kind of
>thing.  Any thoughts on this?  Might make an interesting paper.
>Hmmmmm.  Never mind, I didn't say that...

Don't forget that you not only must put the results somewhere, the elements
of A must also be sent to the proper processors before the computation
starts. On the other hand this distribution of data in "chunks" is what
gives you the speedup. This is especially true on an architecture where
there is a large startup cost for every message sent. Your scheme uses the
extra memory to make "chunk communication" possible, whereas a pipelined
scheme must send intermediate results every now and then. But it doesn't
save on the number of operations. The Laplacian solver mentioned above is
said to require 300 times less COMPUTATION in exchange for the increased
memory requirements. If that statement is to be taken literally, then some
other mechanism must be responsible for the speedup than communications
savings.

Bjorn Lisper