Path: utzoo!utgpu!water!watmath!clyde!bellcore!decvax!purdue!gatech!hubcap!Donald.Lindsay From: Donald.Lindsay@K.GP.CS.CMU.EDU Newsgroups: comp.hypercube Subject: bandwidth balance Message-ID: <964@hubcap.UUCP> Date: 11 Feb 88 19:10:10 GMT Sender: fpst@hubcap.UUCP Lines: 33 Approved: hypercube@hubcap.clemson.edu When building a parallel machine, a designer chooses the balance between computational resources, and memory bandwidth. For example, both Intel and Thinking Machines recently announced new hypercubes, which had about the same memory bandwidth as the previous models, but with vector arithmetic units spread through the cube. In general, today's hypercubes are bandwidth-heavy compared to conventional machines. A 256-node Butterfly has an 8K-bit-wide path to memory. (Yes, I know it's not quite a cube.) A 1024-node NCUBE has a 16K-bit-wide path to memory. A 64K-processor Connection Machine has a 64K-bit-wide path. This is somewhat more than any Cray - regardless of where in the Cray you choose to measure. I recently heard a talk by Gil Weigand of Sandia National Labs. He claims considerable success in getting near-linear scaleup on his NCUBE/10. In particular, he mentioned a Laplacian solver which was deliberately memory intensive. It used 128 times the memory ( 2MB --> 256MB ) in return for 300 times less computation. He claimed his time-to-result was dramatically better than on the Sandia Cray, even though the Cray is the superior in MFLOPS. This raises several interesting questions. - Could this algorithm work on the Cray, or is the massive memory bandwidth the whole secret ? - Is a 64-processor Cray-4 going to compare more favorably with the (bigger) cubes it will compete with ? - Can we find other problems that fall to such attacks ? I'd call this good news. Don lindsay@k.gp.cs.cmu.edu CMU Computer Science