Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!ira.uka.de!fauern!NewsServ!sunmanager!uh311ae
From: uh311ae@sunmanager.lrz-muenchen.de (Henrik Klagges)
Newsgroups: comp.arch
Subject: Re: iWARP notes... it's pretty neat
Keywords: iWARP, parallel processing, vlsi,
Message-ID: <uh311ae.675674229@sunmanager>
Date: 31 May 91 07:17:09 GMT
References: <16136@life.ai.mit.edu>
Sender: news@Informatik.TU-Muenchen.DE
Organization: Technische Universitaet Muenchen, Germany
Lines: 68


Hello,

Thanks to Richard Lethin (lethin@ai.mit.edu) for his iWARP summary.
I would like to comment on some statements, because I disagree that
the iWARP is 'a neat, solid, interesting system'.

>Each processor has 4 "physical pathways" (4 incoming and 4 outgoing)
>so it connects easily into a 2-D mesh.  However, aside from the
>restriction that X-channels cannot connect to Y channels (they are a
>half-clock out of phase) they could be connected in any topology.

Basically, 4 bidirectionals isn't bad. I would prefer the 8 ones of the
new transputers, especially given the fact that they support a virtual
channel concept - i.e., giving you as many software channels as you want.
The XX, YY only restriction, however, is severe and sounds like an engi-
neering joke.

> They claim 40 Mbyte/sec (at 20MHz) on each channel; there are 8 channels. 
> (We did some benchmarking, normalizing for the slow clock, and even in 
> the tightest loop we could construct we could only get one processor to 
> send to the other at half of peak.

Thus proving that the claim must be wrong in any real-world system.

>The distinguishing feature of the iWARP instruction set is a VLIW-mode
>96-bit long Compute & Access instruction (C&A).  An FP multiply, an FP
>add, two memory operations, and a loop test can be issued and executed
>parallel.  A team of compiler people is working to make their
>single-chip compiler produce this instruction.  Currently, it does
>not.  However, the assembly language inlining is particularly
>well-implemented and should allow one to hand-code an inner loop
>seamlessly, painlessly, and efficiently. 

A 'single-chip compiler' which 'currently does not' for a selling pa-
rallel system ? 
This means that 'the distinguishing feature' essentially doesn't work,
except if you hand-code. This sounds like the story of optimizing com-
pilers and i860 performance. 

> These boards are expensive: around $30,000 for the SBA and $15,000 for
> the SIB. 
...
> a single 20 MHz iWARP chip was 1/2 the speed of the SPARC host
...
> There are some questions about the scalability of this system.
> Per-node price is still very high ($9000, including RAM).  It's not
> clear why the price is so high.

At $9K half-Sparc performance ? I'd rather buy a full Sparc (including
color monitor, 16Megs & HDD). For large parallelism, there is still a
connection machine (SIMD), a BBN Butterfly, a Meiko Computing Surface,
Paracom ... at less money. The fact that the iWARP has no Cache and no 
DRAM support (e.g. as transputers do) makes it very vulnerable to high 
speed SRAM prices - and very unlikely to zoom much higher than 20MHz 
in clock frequency. 

The iWARP was from the very beginning designed to be a building block
for 2D-mesh dataflow computers. Giving the right problems, dataflow 
can be very fast, given the wrong, it's useless. At half-Sparc speed 
the iWARP is slow even on this very specialized home turf, so I say,
forget it.

Cheers ! Rick@vee.lrz-muenchen.de

Henrik Klagges, U of Munich, Physics Dep.
#include "std_disclaimer.h"