Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!wuarchive!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.arch Subject: Re: Networking for Distributed Computing Message-ID: Date: 8 Apr 91 14:10:58 GMT References: <1991Apr5.182853.20728@hubcap.clemson.edu> <12606@pt.cs.cmu.edu> Sender: usenet@ee.udel.edu Organization: College of Marine Studies, U. Del. Lines: 79 Nntp-Posting-Host: perelandra.cms.udel.edu In-reply-to: lindsay@gandalf.cs.cmu.edu's message of 7 Apr 91 17:50:25 GMT > On 7 Apr 91 17:50:25 GMT, lindsay@gandalf.cs.cmu.edu (Donald Lindsay) said: I wrote: me> Unfortunately, the most commonly available networking option me> (ethernet) uses a broadcast approach, which is definitely me> sub-optimal for the communications needs of many "natural" me> parallel distributed algorithms. Donald> Actually, shared-channel broadcast is optimal, as long as - Donald> you don't run out of bandwidth (or pile up big latencies). - Donald> detecting that a message isn't for you, doesn't impact your Donald> performance. The shared broadcast channel might be "optimal" for some hypothetical general problem, but if the problem naturally requires only nearest-neighbor communications, then I can't see how broadcast is more "optimal" than nearest-neighbor point-to-point communications. The potential problem with the broadcast net is that in a well-balanced system, all the processors are going to try to dump all their communications at once, thus saturating the network. I am not sure how to take this properly into account in the performance analysis.... Donald> The trouble with point-to-point links is that you wind up implementing Donald> message forwarding, the downside being Donald> - more code Donald> - more latency Donald> - performance impact on the intermediaries No! I have no interest in message forwarding! That is what the ethernet is for. The SCSI-based network idea is strictly intended as a supplement to allow more networks to be in operation for nearest-neighbor communications. me> A suitably designed code (for example a 3-D spectral element code me> for fluid dynamics using explicit time marching techniques) should me> be capable of 1 GFLOPS performance on a network of 32 IBM me> RS/6000-320's. Donald> Kung's "Law" says that if you scale node performance, without Donald> increasing communication bandwidth, then nodes require more Donald> memory: N, N^2 or even N^3 as much, depending on algorithm. Donald> Before choosing a communications setup, I would want to study Donald> your application's characteristics, and work up some ratios Donald> and granularities. Well, I have done a fair bit of work on this. The work per node per step is: (FP ops)/node/step = 140 L M N^2 while the communication required per interface per step is (64-bit words read/written)/side/step = 7 * (L,M)*N where (L,M) means either L or M, depending on what side one is communicating through. "Interior" nodes will have 4 "sides", "edge" nodes will have 3 "sides", and "corner" nodes will have 2 "sides". I envision a 4x4 mesh of nodes, with L & M between 25 and 75 and N being between 5 and 12. This gives node computation times (on a 5-10 MFLOPS cpu) of near 1 second between communications. The part of the problem that I do not know how to model is the time required for the communications part. If it were point-to-point, I would use a latency plus a quantity of data divided by the transfer rate. With a broadcast network, I do not know how to model the reduction of the transfer rate caused by network saturation.... -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET