Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!mips!wrdis01!gatech!hubcap!fpst
From: wangjw@cs.purdue.edu (Jingwen Wang)
Newsgroups: comp.parallel
Subject: iPSC/860 Communication Performance
Message-ID: <1991Apr23.123808.10313@hubcap.clemson.edu>
Date: 22 Apr 91 20:14:05 GMT
Sender: fpst@hubcap.clemson.edu (Steve Stevenson)
Reply-To: wangjw@cs.purdue.edu ()
Organization: Department of Computer Science, Purdue University
Lines: 52
Approved: parallel@hubcap.clemson.edu


  The ipsc/860 hypercube is noted by its very fast processor and circuit
switching communications. The processor speed is around 9-10 times faster
than the Ncube/2 as we recently measured. However, the communication
speed is by far not raised by a comparable rate. This makes people
dissapointed as they moved their ipsc/2 or Ncube code to this machine
because the speedup would drop dramatically.

  We were at first amazed to see that broadcast and multicast as well as
many other global communication calls are provided on ipsc/860. But after
experiments, we found most global communication calls are not as efficient
as we had expected.

  The broadcast function is effected by the csend or isend call with the
destination parameter specified as -1. We compared the speed of this call
with the most simple method -- using a loop sending a separate message to 
every other node. The everage elapsed time from sending to receiving for 
broadcast call is around 1/2 of the looping method for message length of 
500. This is quite good since it really speedups the communication. It
is also possible to broadcast to a subcube of nodes.

  The multicast function of the i860 machine has exactly the same speed
as the simplest looping method (send a separate message to each 
destination). The only benifit is the simplification of expression (but
you have to prepare a destination list which is at least as complicated
as a loop to send the message separately). This is difficult to improve
because the destinations are supposed to be arbitrary rather than regular, 
as is opposed to broadcast.

  Also there are global collection routines to collect a contribution 
of data from each node and after the operation each node gets a copy
of the collection. Such an operation results in substantial reduction of
communication time on store and forward networks (it takes about the same
time as a single broadcasting). But in ipsc/860, it is even slightly slower
than if each node sends a broadcast message, which effects a fully-exchange.  
  It seems that the only benifit of the circuit-switched networks in
global communication is its broadcast, which saves time by sending
messages simultaneously via several channels. Of course, for point to
point communications, they are certainly better than store-and-forward
message passing.

  The above comments are only some negative points on circuit-switching
networks. Some experts are obviously over optimistic on the performance
of such networks. They need to be improved, too.

  Jingwen Wang
  Dept. CS. 
  Purdue University

  wangjw@cs.purdue.edu

  
-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell