Path: utzoo!utgpu!water!watmath!clyde!bellcore!decvax!purdue!gatech!hubcap!Donald.Lindsay
From: Donald.Lindsay@K.GP.CS.CMU.EDU
Newsgroups: comp.hypercube
Subject: bandwidth balance
Message-ID: <964@hubcap.UUCP>
Date: 11 Feb 88 19:10:10 GMT
Sender: fpst@hubcap.UUCP
Lines: 33
Approved: hypercube@hubcap.clemson.edu


When building a parallel machine, a designer chooses the balance between
computational resources, and memory bandwidth. For example, both Intel and
Thinking Machines recently announced new hypercubes, which had about the
same memory bandwidth as the previous models, but with vector arithmetic
units spread through the cube.

In general, today's hypercubes are bandwidth-heavy compared to conventional
machines. A 256-node Butterfly has an 8K-bit-wide path to memory. (Yes, I
know it's not quite a cube.) A 1024-node NCUBE has a 16K-bit-wide path to
memory.  A 64K-processor Connection Machine has a 64K-bit-wide path. This is
somewhat more than any Cray - regardless of where in the Cray you choose to
measure.

I recently heard a talk by Gil Weigand of Sandia National Labs. He claims
considerable success in getting near-linear scaleup on his NCUBE/10. In
particular, he mentioned a Laplacian solver which was deliberately memory
intensive. It used 128 times the memory ( 2MB --> 256MB ) in return for 300
times less computation. He claimed his time-to-result was dramatically
better than on the Sandia Cray, even though the Cray is the superior in
MFLOPS.

This raises several interesting questions.
- Could this algorithm work on the Cray, or is the massive memory bandwidth
  the whole secret ?
- Is a 64-processor Cray-4 going to compare more favorably with the (bigger)
  cubes it will compete with ?
- Can we find other problems that fall to such attacks ?

I'd call this good news.


	Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science