Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!mips!wrdis01!gatech!hubcap!fpst From: wangjw@cs.purdue.edu (Jingwen Wang) Newsgroups: comp.parallel Subject: iPSC/860 Communication Performance Message-ID: <1991Apr23.123808.10313@hubcap.clemson.edu> Date: 22 Apr 91 20:14:05 GMT Sender: fpst@hubcap.clemson.edu (Steve Stevenson) Reply-To: wangjw@cs.purdue.edu () Organization: Department of Computer Science, Purdue University Lines: 52 Approved: parallel@hubcap.clemson.edu The ipsc/860 hypercube is noted by its very fast processor and circuit switching communications. The processor speed is around 9-10 times faster than the Ncube/2 as we recently measured. However, the communication speed is by far not raised by a comparable rate. This makes people dissapointed as they moved their ipsc/2 or Ncube code to this machine because the speedup would drop dramatically. We were at first amazed to see that broadcast and multicast as well as many other global communication calls are provided on ipsc/860. But after experiments, we found most global communication calls are not as efficient as we had expected. The broadcast function is effected by the csend or isend call with the destination parameter specified as -1. We compared the speed of this call with the most simple method -- using a loop sending a separate message to every other node. The everage elapsed time from sending to receiving for broadcast call is around 1/2 of the looping method for message length of 500. This is quite good since it really speedups the communication. It is also possible to broadcast to a subcube of nodes. The multicast function of the i860 machine has exactly the same speed as the simplest looping method (send a separate message to each destination). The only benifit is the simplification of expression (but you have to prepare a destination list which is at least as complicated as a loop to send the message separately). This is difficult to improve because the destinations are supposed to be arbitrary rather than regular, as is opposed to broadcast. Also there are global collection routines to collect a contribution of data from each node and after the operation each node gets a copy of the collection. Such an operation results in substantial reduction of communication time on store and forward networks (it takes about the same time as a single broadcasting). But in ipsc/860, it is even slightly slower than if each node sends a broadcast message, which effects a fully-exchange. It seems that the only benifit of the circuit-switched networks in global communication is its broadcast, which saves time by sending messages simultaneously via several channels. Of course, for point to point communications, they are certainly better than store-and-forward message passing. The above comments are only some negative points on circuit-switching networks. Some experts are obviously over optimistic on the performance of such networks. They need to be improved, too. Jingwen Wang Dept. CS. Purdue University wangjw@cs.purdue.edu -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell