Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!mimsy!eneevax!hsu
From: hsu@eneevax.UUCP (Dave Hsu)
Newsgroups: comp.arch
Subject: Re: Connection Machine Argument
Message-ID: <374@eneevax.UUCP>
Date: Thu, 4-Dec-86 10:53:17 EST
Article-I.D.: eneevax.374
Posted: Thu Dec  4 10:53:17 1986
Date-Received: Fri, 5-Dec-86 04:42:28 EST
References: <745@husc6.UUCP>
Reply-To: hsu@eneevax.UUCP (Dave Hsu)
Distribution: na
Organization: The Royal Maryland Ice Cream Consumption Laboratory
Lines: 85

In article <745@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes:
>The idea behind the Connection Machine architecture, of having an expensive
>hypercube interconnect with extremely simple 1-bit SIMD processors at the
>nodes, has bothered me quite a bit.  The below tries to rigorously argue that
>such an architecture is inappropriate.  I would very much welcome any comments
>on this, either to me personally or to the net.
>...
>     This has two implications on the Connection Machine  architecture  of  many
>1-bit  processors connected in a hypercube, if the goal is to eventually be able
>to realize such an architecture on a single chip/wafer.

Precisely here ^^^ is the problem.

>     First, the dense hypercube interconnect should only be used when necessary,
>and  a  mesh  interconnect  should  be used when possible, since it will be much
>cheaper, in hardware terms.  Since most of the applications that it  is  claimed
>can run on a C.M. (vision, database, graphics, etc) can be run on a grid system,
>in the long term they will not be cost-effective to run on a Connection  Machine
>(indeed,  they  are  not  cost-effective today, compared to alternative parallel
>architectures).

I am decidedly not an expert on the CM, not even a novice yet; I haven't
seen our department's machine and I don't have my copy of Hillis' book
nearby, but it seems to me that the router does in fact perform the first
task to a degree (the first few bits worth, anyway) and for nearby nodes,
talking is cheap.

On the second point, it seems to me that the CM is not quite as
cumbersome at convolving an image as the MPP (comparable in
individual node power as well as in number) would be if you tried
to associate lisp nodes on it.  The interconnect scheme makes a
tremendous difference, but there is more flexibility in one direction
than in the other.

>     Second, for applications which absolutely require the  dense  interconnect,
>the  processors  should be made as large as possible and should NOT be simple 1-
>bit machines...
>     For example, suppose I wish to build a complete system on a  single  wafer,
>and  I  have 10,000,000,000 units of silicon.  Suppose a simple processor with 1
>unit of processing power has an area of 1000, while a complex processor with  10
>units  of processing power has an area of 100,000.  Suppose the hypercube inter-
>connect requires n*n area...
>
>     In other words, even though each individual simple processor  is  10  times
>more  efficient (1 processing unit per 1000 area units) than a complex processor
>(10 processing units per 100,000 area units), the system built  out  of  complex
>processors  is  more powerful  (600,000 processing units) than a system built of
>simple processors (100,000 processing units).
>
>     This result is not dependent on the numbers in the example, but is general.
>
>						Ehud Reiter

As you observe, the 2-D VLSI interconnect loses big to the 3-D brain
interconnect.  But wait!  Why do we want VLSI interconnects?  As your
numbers show, the architecture that packs more gates per unit of
silicon will have more power.  In the wafer argument, you suppose (one)
that putting everybody on one or two wafers is superior, and that
(two) the interconnect size for a complex node does NOT increase in
complexity over that for a simple node.

(1) Having everybody on one wafer is advantageous only if the
interconnect speed between processors on different chips is that much
slower.  However, by keeping tremendous numbers of processors on a
single wafer, what speed you gain by throwing out bus drivers,
backplanes and the like is clobbered by the speed you lose by not using
3-D interconnects in a hypercube architecture.  In a 3-D world, if we
suppose you have only 2,000,000 units of silicon available, for the
simple processor model you can pack 1,000 processors (not far from
realistic figures) but for the complex model, you may have fewer than 20.
Where did the rest of the complex power go?  Nowhere.  The simple model
caught up by disposing of idle 2-D interconnects.

(2) In your example we now have a model of 10-bit processors with a
bit-serial I/O path.  Obviously, there is a bottleneck, and that
bottleneck will cost you the load time for 10 bits, whereupon you end
up with a very idle processor that acts like...60,000 simple ones.  Or
you can spend the extra complexity on silicon...

-dave
-- 
David Hsu 	   Comm. & Signal Processing Lab, Systems Research Center /or/
"Him? No Comment." Systems Staff, Engineering Computer Facility, Dept. of EE
	-UMd	   The University of Maryland, College Park, MD 20742
ARPA: hsu@eneevax.umd.edu	UUCP: [seismo,allegra]!umcp-cs!eneevax!hsu