Xref: utzoo comp.arch:21864 comp.parallel:2387
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!hubcap!fpst
From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
Newsgroups: comp.arch,comp.parallel
Subject: Networking for Distributed Computing
Message-ID: <1991Apr5.182853.20728@hubcap.clemson.edu>
Date: 5 Apr 91 14:39:33 GMT
Sender: usenet@ee.udel.edu
Followup-To: comp.arch
Organization: College of Marine Studies, U. Del.
Lines: 53
Approved: parallel@hubcap.clemson.edu
Nntp-Posting-Host: perelandra.cms.udel.edu
Source-Info:  From (or Sender) name not authenticated.


The recent introduction of the HP "Snakes" computer systems
underscores a critical characteristic of modern scientific computing,
that is that the rules of the game change *very* quickly.

It is easy to convince one's self that the most cost-effective
computing environment over the long term (which in these days means "a
few years") is a heterogeneous distributed network, with low-cost
hardware that is updated incrementally.

It is vital in this case to use open systems and off-the-shelf
technology.   Unfortunately, the most commonly available networking
option (ethernet) uses a broadcast approach, which is definitely
sub-optimal for the communications needs of many "natural"
parallel distributed algorithms.

In talking to my IBM salescritter about this topic, he suggested using
an additional SCSI interface as the networking option.  I think that
this is a potentially important idea.  It uses off-the shelf hardware,
and only requires the writing of some (fairly simple?) SCSI device
drivers.

The system I envision would consist of some moderate number of
high-performance cheap workstations (IBM RS/6000-320 or HP/9000-720),
lets take 32 as an example.  An additional SCSI interface on each unit
would provide 7 high-speed, point-to-point interfaces, which could be
reconfigured (via a patch panel) to provide a large number of
different topologies, including: a line, a 2-D mesh, a 3-D mesh, a
ring, etc.

I believe that it would be easy to modify a Linda-like software system
to operate efficiently on such a system.  Simply map particular named
tuple-spaces into particular device drivers so that point-to-point
communications may be executed directly.  A "default" or un-named
tuple space could still be handled in a global fashion using the
broadcast network.  It might be used for things like global
accumulations and synchronization messages.

A suitably designed code (for example a 3-D spectral element code for
fluid dynamics using explicit time marching techniques) should be
capable of 1 GFLOPS performance on a network of 32 IBM RS/6000-320's.
With current university discounts and third-party memory, such a
system (with 32 MB RAM per node) could be assembled for less than
$500,000.  Apparently, the HP/9000-720 would be capable of slightly
higher performance at a similar cost (though I don't know about
3rd-party memory for the HP's).

So what is wrong with this idea?  Am I misunderstanding what can be
done with a SCSI interface or how hard the implementation of a
buffered FIFO would be?
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell