Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!wuarchive!udel!nigel.ee.udel.edu!mccalpin
From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
Newsgroups: comp.arch
Subject: Re: Networking for Distributed Computing
Message-ID: <MCCALPIN.91Apr8101058@pereland.cms.udel.edu>
Date: 8 Apr 91 14:10:58 GMT
References: <1991Apr5.182853.20728@hubcap.clemson.edu> <12606@pt.cs.cmu.edu>
Sender: usenet@ee.udel.edu
Organization: College of Marine Studies, U. Del.
Lines: 79
Nntp-Posting-Host: perelandra.cms.udel.edu
In-reply-to: lindsay@gandalf.cs.cmu.edu's message of 7 Apr 91 17:50:25 GMT

> On 7 Apr 91 17:50:25 GMT, lindsay@gandalf.cs.cmu.edu (Donald Lindsay) said:

I wrote:
me> Unfortunately, the most commonly available networking option
me> (ethernet) uses a broadcast approach, which is definitely
me> sub-optimal for the communications needs of many "natural"
me> parallel distributed algorithms.

Donald> Actually, shared-channel broadcast is optimal, as long as -
Donald> you don't run out of bandwidth (or pile up big latencies).  -
Donald> detecting that a message isn't for you, doesn't impact your
Donald> performance.

The shared broadcast channel might be "optimal" for some hypothetical
general problem, but if the problem naturally requires only
nearest-neighbor communications, then I can't see how broadcast is
more "optimal" than nearest-neighbor point-to-point communications.

The potential problem with the broadcast net is that in a
well-balanced system, all the processors are going to try to dump all
their communications at once, thus saturating the network.  I am not
sure how to take this properly into account in the performance
analysis....


Donald> The trouble with point-to-point links is that you wind up implementing
Donald> message forwarding, the downside being
Donald> - more code
Donald> - more latency
Donald> - performance impact on the intermediaries

No!  I have no interest in message forwarding!  That is what the
ethernet is for.  The SCSI-based network idea is strictly intended as
a supplement to allow more networks to be in operation for
nearest-neighbor communications.


me> A suitably designed code (for example a 3-D spectral element code
me> for fluid dynamics using explicit time marching techniques) should
me> be capable of 1 GFLOPS performance on a network of 32 IBM
me> RS/6000-320's.

Donald> Kung's "Law" says that if you scale node performance, without
Donald> increasing communication bandwidth, then nodes require more
Donald> memory: N, N^2 or even N^3 as much, depending on algorithm.
Donald> Before choosing a communications setup, I would want to study
Donald> your application's characteristics, and work up some ratios
Donald> and granularities.

Well, I have done a fair bit of work on this.  The work per node per
step is:

	(FP ops)/node/step = 140 L M N^2

while the communication required per interface per step is 

	(64-bit words read/written)/side/step = 7 * (L,M)*N

where (L,M) means either L or M, depending on what side one is
communicating through.  

"Interior" nodes will have 4 "sides", "edge" nodes will have 3
"sides", and "corner" nodes will have 2 "sides".

I envision a 4x4 mesh of nodes, with L & M between 25 and 75 and N
being between 5 and 12.   This gives node computation times (on a 5-10
MFLOPS cpu) of near 1 second between communications.

The part of the problem that I do not know how to model is the time
required for the communications part.  If it were point-to-point, I
would use a latency plus a quantity of data divided by the transfer
rate.   With a broadcast network, I do not know how to model the
reduction of the transfer rate caused by network saturation....
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET