Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!brunix!rph
From: rph@cs.brown.edu (Richard Hughey)
Newsgroups: comp.arch
Subject: Re: Mass produced custom chips
Message-ID: <73517@brunix.UUCP>
Date: 25 Apr 91 18:34:20 GMT
References: <COLWELL.91Apr23090526@pdx023.pdx023> <12785@pt.cs.cmu.edu>
Sender: news@brunix.UUCP
Reply-To: rph@cs.brown.edu (Richard Hughey)
Organization: Brown University Department of Computer Science
Lines: 62

In article <12785@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:
>Usually, the suggestion is that the chip will fit a special niche -
>such as a radar autocorrelator chip or a pattern matcher chip - or
>will be a coprocessor (in some loose sense of the word). The
>general-purpose market is to be avoided, not only for the good
>reasons which you gave, but also because it's increasingly hard to
>find big wins there. In a niche, it may be possible to get an
>enormous win: the Splash board is sometimes 200 times faster than a
>16K-PE CM-2.
>
>Don		D.C.Lindsay 	Carnegie Mellon Robotics Institute

Comparing co-processors against the Connection Machine isn't
exactly the way to go - the CM-2 can be regarded as a massively
parallel (and massively COSTLY) general-purpose co-processor, a
great contrast slightly- or non-parallel supercomputers.  Splash'
main advantage over the CM-2 is its cost - the performance on the
sequence comparison example is more realistically 10 times slower
than the Splash board on 100x100 sequence comparison (the version
mentioned in Computer is for distributed sequence comparison,
using 100 of the 16K PEs - 100x100 (or, eqv., 100x128) comparison
can be done 0.17 seconds, in comparison to Splash' 0.020 seconds
[CM-2 performance could be further increaded by a factor of 4 or
more by using minimum-size words, leading to a somewhat more
complicated program.]).

Where Splash does win (vs CM-2) is on size (cost) and its ability
to prototype hardware designs before fabrication - programming
can be slow (the seq. comp. program has many many many lines of
code) but is much faster than designing and fabricating a new
system, which when up and running might not be the perfect
solution to a problem.

As part of my thesis, I've implemented a programmable linear
systolic array, designed specifically for combinatorial
applications (sequence comparison prime among them).  The system
(The Brown Systolic Array, or B-SYS) has traditional SIMD
programming with very efficient systolic communication.  Sequence
comparison variations run 5-40 lines of B-SYS code per cell
program, though some systolic programming issues I'm looking at
make this much easier.  There's a running 10-chip (470-processor)
prototype system that does simple seq. comparison about 1/20 the
speed of Splash, so slow because each instruction execution
requires 3 I/O writes over an ISA bus (ugh!).  A full
implementation (32 chips (1504 PEs) on a single board w/ local
instruction sequencer) could perform 3-5 8-bit GOPS (2x faster
than Splash).  A redesign of the chip to 0.8 micron CMOS could
increase PE density (and performance) by a factor of 10.

There's a paper upcoming in ICPP '91 on this, which I can send
preprints of to anyone interested.  Also, the tech report version
of my thesis should be out in a couple of months.


      - Richard


--------------------------------------- Richard Hughey              
INTERNET:  rph@cs.brown.edu	        Brown University            
BITNET:    rph@browncs		        Box 1910                    
(decvax, allegra, ...)!brunix!rph       Providence, RI 02912