Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!ira.uka.de!fauern!NewsServ!sunmanager!uh311ae From: uh311ae@sunmanager.lrz-muenchen.de (Henrik Klagges) Newsgroups: comp.arch Subject: Re: iWARP notes... it's pretty neat Keywords: iWARP, parallel processing, vlsi, Message-ID: Date: 31 May 91 07:17:09 GMT References: <16136@life.ai.mit.edu> Sender: news@Informatik.TU-Muenchen.DE Organization: Technische Universitaet Muenchen, Germany Lines: 68 Hello, Thanks to Richard Lethin (lethin@ai.mit.edu) for his iWARP summary. I would like to comment on some statements, because I disagree that the iWARP is 'a neat, solid, interesting system'. >Each processor has 4 "physical pathways" (4 incoming and 4 outgoing) >so it connects easily into a 2-D mesh. However, aside from the >restriction that X-channels cannot connect to Y channels (they are a >half-clock out of phase) they could be connected in any topology. Basically, 4 bidirectionals isn't bad. I would prefer the 8 ones of the new transputers, especially given the fact that they support a virtual channel concept - i.e., giving you as many software channels as you want. The XX, YY only restriction, however, is severe and sounds like an engi- neering joke. > They claim 40 Mbyte/sec (at 20MHz) on each channel; there are 8 channels. > (We did some benchmarking, normalizing for the slow clock, and even in > the tightest loop we could construct we could only get one processor to > send to the other at half of peak. Thus proving that the claim must be wrong in any real-world system. >The distinguishing feature of the iWARP instruction set is a VLIW-mode >96-bit long Compute & Access instruction (C&A). An FP multiply, an FP >add, two memory operations, and a loop test can be issued and executed >parallel. A team of compiler people is working to make their >single-chip compiler produce this instruction. Currently, it does >not. However, the assembly language inlining is particularly >well-implemented and should allow one to hand-code an inner loop >seamlessly, painlessly, and efficiently. A 'single-chip compiler' which 'currently does not' for a selling pa- rallel system ? This means that 'the distinguishing feature' essentially doesn't work, except if you hand-code. This sounds like the story of optimizing com- pilers and i860 performance. > These boards are expensive: around $30,000 for the SBA and $15,000 for > the SIB. ... > a single 20 MHz iWARP chip was 1/2 the speed of the SPARC host ... > There are some questions about the scalability of this system. > Per-node price is still very high ($9000, including RAM). It's not > clear why the price is so high. At $9K half-Sparc performance ? I'd rather buy a full Sparc (including color monitor, 16Megs & HDD). For large parallelism, there is still a connection machine (SIMD), a BBN Butterfly, a Meiko Computing Surface, Paracom ... at less money. The fact that the iWARP has no Cache and no DRAM support (e.g. as transputers do) makes it very vulnerable to high speed SRAM prices - and very unlikely to zoom much higher than 20MHz in clock frequency. The iWARP was from the very beginning designed to be a building block for 2D-mesh dataflow computers. Giving the right problems, dataflow can be very fast, given the wrong, it's useless. At half-Sparc speed the iWARP is slow even on this very specialized home turf, so I say, forget it. Cheers ! Rick@vee.lrz-muenchen.de Henrik Klagges, U of Munich, Physics Dep. #include "std_disclaimer.h"