Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!ncrlnk!ncrcae!hubcap!byrd From: byrd@portia.Stanford.EDU (Greg Byrd) Newsgroups: comp.parallel Subject: Re: scalability of n-cubes, meshes (was: IPSC Communications) Summary: 2D not always best Keywords: iPSC Parallel hypercubes meshes torus scalability Message-ID: <7205@hubcap.clemson.edu> Date: 27 Nov 89 13:40:53 GMT Sender: fpst@hubcap.clemson.edu Lines: 51 Approved: parallel@hubcap.clemson.edu In article <7178@hubcap.clemson.edu> wilson@carcoar.Stanford.EDU (Paul Wilson) writes: {stuff deleted} > >It would *seem* to me that a 3D mesh is the only way to go >because that's the highest dimensionality you can embed into >a 3D reality. You get constant time per hop, no problem. > >Now could somebody summarize Dally's argument that 2D meshes >(and toruses) are better? I assume you mean better than 3D? That is not Dally's argument at all. The argument is that given constant bisection width, lower dimensional k-ary n-cubes give better latency and throughput than higher dimensional ones. The main reason is that there are fewer channels which contribute to the bisection, so you can make them wider. Latency is dependent on channel width, number of hops, and the size of the message. For lower dimensions, the number of hops goes up, but so does the channel width. So for a given number of nodes (k^n), you trade off dimension (n) against radix (k) to get the best average latency. Assuming that delay across a wire is constant (independent of the length) then the optimal dimensionality for 256, 16K, and 1M nodes is 2, 4, and 5, respectively. If a logarithmic delay is assumed, the optimal points are 2, 3, and 5. If a linear delay is assumed, then 2 is optimal for all three network sizes. With regard to long wires for tori, one can "fold" the torus in such a way that all wires are twice as long as the mesh, but without any wires that must traverse the entire edge. Note also that meshes have half the bisection width of tori, so they can have twice the channel width, again at the expense of a larger average number of hops. (All of the latency numbers, by the way, assume every message travels the "average" distance, and there is no contention in the network.) For more details, see Dally's thesis (CalTech, 1986 -- also published by Kluwer) or see: William J. Dally, "Wire-Efficient VLSI Multiprocessor Communcation Networks," in _Advanced_Research_in_VLSI_, MIT Press, 1987, pp. 391-415. ----------------------------------------------------------------------------- Greg Byrd byrd@sumex-aim.stanford.edu Knowledge Systems Lab 701 Welch Rd., Bldg. C, Palo Alto, CA 94304 Stanford University (415) 725-3849 Brought to you by Super Global Mega Corp .com