Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!uakari.primate.wisc.edu!samsung!emory!hubcap!fpst From: grunwald@foobar.colorado.edu (Dirk Grunwald) Newsgroups: comp.parallel Subject: Re: iPSC/860 Communication Performance Message-ID: <1991May1.173018.18757@colorado.edu> Date: 1 May 91 17:30:18 GMT References: <1991Apr23.123808.10313@hubcap.clemson.edu> Sender: news@colorado.edu (The Daily Planet) Reply-To: grunwald@foobar.colorado.edu Organization: University of Colorado at Boulder Lines: 43 Approved: parallel@hubcap.clemson.edu In-Reply-To: wangjw@cs.purdue.edu's message of 22 Apr 91 20:14:05 GMT Nntp-Posting-Host: foobar.colorado.edu >>>>> On 22 Apr 91 20:14:05 GMT, wangjw@cs.purdue.edu (Jingwen Wang) said: ... JW> The broadcast function is effected by the csend or isend call with the JW> destination parameter specified as -1. We compared the speed of this call JW> with the most simple method -- using a loop sending a separate message to JW> every other node. The everage elapsed time from sending to receiving for JW> broadcast call is around 1/2 of the looping method for message length of JW> 500. This is quite good since it really speedups the communication. It JW> is also possible to broadcast to a subcube of nodes. -- The 'simplest method' is not to send to every node in the system; it's to use a broadcast tree, making broadcast be a O(Lg N) proces, not O(N). That's what the Intel O/S does. Your 1/2 speed of the looping method would then hold for only for N=4. For N=8, it's 2.6, for N=16, it's 4 times faster to use a tree, etc. >From your value of '2' I assume you used a 4-node system, or possibly 8 nodes with imprecision in your results. And of course, circuit switch networks have *no* benefit for broadcast trees, because bcast trees use single-hop communication. The only way to improve bcast is to add a communication processor to offload the store & replicate work. JW> The multicast function of the i860 machine has exactly the same speed JW> as the simplest looping method (send a separate message to each JW> destination). The only benifit is the simplification of expression (but JW> you have to prepare a destination list which is at least as complicated JW> as a loop to send the message separately). This is difficult to improve JW> because the destinations are supposed to be arbitrary rather than regular, JW> as is opposed to broadcast. -- Again, if you're sending to > lgN nodes, you're better off (modulo the fact that you interrupt everyone) to simply broadcast to the entire network & have un-interested nodes drop the message. You can obviously improve on this with a variety of multicast tree algorithms that exist. Of course, worm hole networks 'suffer' from these same problems, because you're not using the feature these methods attempt to make efficient - point-to-point communication. -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell