Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!uakari.primate.wisc.edu!samsung!emory!hubcap!fpst
From: grunwald@foobar.colorado.edu (Dirk Grunwald)
Newsgroups: comp.parallel
Subject: Re: iPSC/860 Communication Performance
Message-ID: <1991May1.173018.18757@colorado.edu>
Date: 1 May 91 17:30:18 GMT
References: <1991Apr23.123808.10313@hubcap.clemson.edu>
Sender: news@colorado.edu (The Daily Planet)
Reply-To: grunwald@foobar.colorado.edu
Organization: University of Colorado at Boulder
Lines: 43
Approved: parallel@hubcap.clemson.edu
In-Reply-To: wangjw@cs.purdue.edu's message of 22 Apr 91 20:14:05 GMT
Nntp-Posting-Host: foobar.colorado.edu

>>>>> On 22 Apr 91 20:14:05 GMT, wangjw@cs.purdue.edu (Jingwen Wang) said:
	...
JW>   The broadcast function is effected by the csend or isend call with the
JW> destination parameter specified as -1. We compared the speed of this call
JW> with the most simple method -- using a loop sending a separate message to 
JW> every other node. The everage elapsed time from sending to receiving for 
JW> broadcast call is around 1/2 of the looping method for message length of 
JW> 500. This is quite good since it really speedups the communication. It
JW> is also possible to broadcast to a subcube of nodes.
--

The 'simplest method' is not to send to every node in the system; it's
to use a broadcast tree, making broadcast be a O(Lg N) proces, not
O(N). That's what the Intel O/S does. Your 1/2 speed of the looping
method would then hold for only for N=4. For N=8, it's 2.6, for N=16,
it's 4 times faster to use a tree, etc.

>From your value of '2' I assume you used a 4-node system, or possibly
8 nodes with imprecision in your results.

And of course, circuit switch networks have *no* benefit for broadcast
trees, because bcast trees use single-hop communication. The only way
to improve bcast is to add a communication processor to offload the
store & replicate work.

JW>   The multicast function of the i860 machine has exactly the same speed
JW> as the simplest looping method (send a separate message to each 
JW> destination). The only benifit is the simplification of expression (but
JW> you have to prepare a destination list which is at least as complicated
JW> as a loop to send the message separately). This is difficult to improve
JW> because the destinations are supposed to be arbitrary rather than regular, 
JW> as is opposed to broadcast.
--

Again, if you're sending to > lgN nodes, you're better off (modulo the
fact that you interrupt everyone) to simply broadcast to the entire
network & have un-interested nodes drop the message. You can obviously
improve on this with a variety of multicast tree algorithms that
exist.

Of course, worm hole networks 'suffer' from these same problems,
because you're not using the feature these methods attempt to make
efficient - point-to-point communication.

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell