Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!brunix!rph From: rph@cs.brown.edu (Richard Hughey) Newsgroups: comp.arch Subject: Re: Mass produced custom chips Message-ID: <73517@brunix.UUCP> Date: 25 Apr 91 18:34:20 GMT References: <12785@pt.cs.cmu.edu> Sender: news@brunix.UUCP Reply-To: rph@cs.brown.edu (Richard Hughey) Organization: Brown University Department of Computer Science Lines: 62 In article <12785@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: >Usually, the suggestion is that the chip will fit a special niche - >such as a radar autocorrelator chip or a pattern matcher chip - or >will be a coprocessor (in some loose sense of the word). The >general-purpose market is to be avoided, not only for the good >reasons which you gave, but also because it's increasingly hard to >find big wins there. In a niche, it may be possible to get an >enormous win: the Splash board is sometimes 200 times faster than a >16K-PE CM-2. > >Don D.C.Lindsay Carnegie Mellon Robotics Institute Comparing co-processors against the Connection Machine isn't exactly the way to go - the CM-2 can be regarded as a massively parallel (and massively COSTLY) general-purpose co-processor, a great contrast slightly- or non-parallel supercomputers. Splash' main advantage over the CM-2 is its cost - the performance on the sequence comparison example is more realistically 10 times slower than the Splash board on 100x100 sequence comparison (the version mentioned in Computer is for distributed sequence comparison, using 100 of the 16K PEs - 100x100 (or, eqv., 100x128) comparison can be done 0.17 seconds, in comparison to Splash' 0.020 seconds [CM-2 performance could be further increaded by a factor of 4 or more by using minimum-size words, leading to a somewhat more complicated program.]). Where Splash does win (vs CM-2) is on size (cost) and its ability to prototype hardware designs before fabrication - programming can be slow (the seq. comp. program has many many many lines of code) but is much faster than designing and fabricating a new system, which when up and running might not be the perfect solution to a problem. As part of my thesis, I've implemented a programmable linear systolic array, designed specifically for combinatorial applications (sequence comparison prime among them). The system (The Brown Systolic Array, or B-SYS) has traditional SIMD programming with very efficient systolic communication. Sequence comparison variations run 5-40 lines of B-SYS code per cell program, though some systolic programming issues I'm looking at make this much easier. There's a running 10-chip (470-processor) prototype system that does simple seq. comparison about 1/20 the speed of Splash, so slow because each instruction execution requires 3 I/O writes over an ISA bus (ugh!). A full implementation (32 chips (1504 PEs) on a single board w/ local instruction sequencer) could perform 3-5 8-bit GOPS (2x faster than Splash). A redesign of the chip to 0.8 micron CMOS could increase PE density (and performance) by a factor of 10. There's a paper upcoming in ICPP '91 on this, which I can send preprints of to anyone interested. Also, the tech report version of my thesis should be out in a couple of months. - Richard --------------------------------------- Richard Hughey INTERNET: rph@cs.brown.edu Brown University BITNET: rph@browncs Box 1910 (decvax, allegra, ...)!brunix!rph Providence, RI 02912