Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!ncrlnk!ncrcae!hubcap!byrd
From: byrd@portia.Stanford.EDU (Greg Byrd)
Newsgroups: comp.parallel
Subject: Re: scalability of n-cubes, meshes (was: IPSC Communications)
Summary: 2D not always best
Keywords: iPSC Parallel hypercubes meshes torus scalability
Message-ID: <7205@hubcap.clemson.edu>
Date: 27 Nov 89 13:40:53 GMT
Sender: fpst@hubcap.clemson.edu
Lines: 51
Approved: parallel@hubcap.clemson.edu

In article <7178@hubcap.clemson.edu> wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
{stuff deleted}
>
>It would *seem* to me that a 3D mesh is the only way to go
>because that's the highest dimensionality you can embed into
>a 3D reality.  You get constant time per hop, no problem.
>
>Now could somebody summarize Dally's argument that 2D meshes
>(and toruses) are better?


I assume you mean better than 3D?  That is not Dally's argument
at all.  The argument is that given constant bisection width,
lower dimensional k-ary n-cubes give better latency and throughput
than higher dimensional ones.  The main reason is that there are
fewer channels which contribute to the bisection, so you can make
them wider.

Latency is dependent on channel width, number of hops, and the size
of the message.  For lower dimensions, the number of hops goes up,
but so does the channel width.  So for a given number of nodes
(k^n), you trade off dimension (n) against radix (k) to get the 
best average latency.

Assuming that delay across a wire is constant (independent of the
length) then the optimal dimensionality for 256, 16K, and 1M nodes
is 2, 4, and 5, respectively.  If a logarithmic delay is assumed,
the optimal points are 2, 3, and 5.  If a linear delay is assumed,
then 2 is optimal for all three network sizes.

With regard to long wires for tori, one can "fold" the torus in
such a way that all wires are twice as long as the mesh, but without
any wires that must traverse the entire edge.

Note also that meshes have half the bisection width of tori, so
they can have twice the channel width, again at the expense of a
larger average number of hops.  (All of the latency numbers, by the
way, assume every message travels the "average" distance, and there
is no contention in the network.)

For more details, see Dally's thesis (CalTech, 1986 -- also published
by Kluwer) or see:

  William J. Dally, "Wire-Efficient VLSI Multiprocessor Communcation
  Networks," in _Advanced_Research_in_VLSI_, MIT Press, 1987,
  pp. 391-415.

-----------------------------------------------------------------------------
Greg Byrd                 byrd@sumex-aim.stanford.edu
Knowledge Systems Lab     701 Welch Rd., Bldg. C, Palo Alto, CA  94304
Stanford University       (415) 725-3849


Brought to you by Super Global Mega Corp .com