Path: utzoo!utgpu!water!watmath!clyde!burl!codas!uflorida!gatech!hubcap!Bjorn From: lisper@YALE.ARPA (Bjorn Lisper) Newsgroups: comp.hypercube Subject: Re: bandwidth balance Message-ID: <982@hubcap.UUCP> Date: 16 Feb 88 20:45:35 GMT References: <975@hubcap.UUCP> Sender: fpst@hubcap.UUCP Lines: 34 Approved: hypercube@hubcap.clemson.edu [ An editorial reminder: You cannot include more text than you add - the mailer checks that, not me. If it looks simple, I'll occasionally fix it for you, but I don't if it takes editorializing. Please help out. Steve ] In article <975@hubcap.UUCP> rmw6x@hudson.acc.virginia.edu (Robert M. Wise) writes: >In article <964@hubcap.UUCP> Donald.Lindsay@K.GP.CS.CMU.EDU writes: >> .... >>I recently heard a talk by Gil Weigand of Sandia National Labs. He claims >>considerable success in getting near-linear scaleup on his NCUBE/10. >.... >I suspect that there are a lot of algorithms which benefit from this >approach, although not as much as the matrix multiplication kind of >thing. Any thoughts on this? Might make an interesting paper. >Hmmmmm. Never mind, I didn't say that... Don't forget that you not only must put the results somewhere, the elements of A must also be sent to the proper processors before the computation starts. On the other hand this distribution of data in "chunks" is what gives you the speedup. This is especially true on an architecture where there is a large startup cost for every message sent. Your scheme uses the extra memory to make "chunk communication" possible, whereas a pipelined scheme must send intermediate results every now and then. But it doesn't save on the number of operations. The Laplacian solver mentioned above is said to require 300 times less COMPUTATION in exchange for the increased memory requirements. If that statement is to be taken literally, then some other mechanism must be responsible for the speedup than communications savings. Bjorn Lisper