Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!mimsy!eneevax!hsu From: hsu@eneevax.UUCP (Dave Hsu) Newsgroups: comp.arch Subject: Re: Connection Machine Argument Message-ID: <374@eneevax.UUCP> Date: Thu, 4-Dec-86 10:53:17 EST Article-I.D.: eneevax.374 Posted: Thu Dec 4 10:53:17 1986 Date-Received: Fri, 5-Dec-86 04:42:28 EST References: <745@husc6.UUCP> Reply-To: hsu@eneevax.UUCP (Dave Hsu) Distribution: na Organization: The Royal Maryland Ice Cream Consumption Laboratory Lines: 85 In article <745@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes: >The idea behind the Connection Machine architecture, of having an expensive >hypercube interconnect with extremely simple 1-bit SIMD processors at the >nodes, has bothered me quite a bit. The below tries to rigorously argue that >such an architecture is inappropriate. I would very much welcome any comments >on this, either to me personally or to the net. >... > This has two implications on the Connection Machine architecture of many >1-bit processors connected in a hypercube, if the goal is to eventually be able >to realize such an architecture on a single chip/wafer. Precisely here ^^^ is the problem. > First, the dense hypercube interconnect should only be used when necessary, >and a mesh interconnect should be used when possible, since it will be much >cheaper, in hardware terms. Since most of the applications that it is claimed >can run on a C.M. (vision, database, graphics, etc) can be run on a grid system, >in the long term they will not be cost-effective to run on a Connection Machine >(indeed, they are not cost-effective today, compared to alternative parallel >architectures). I am decidedly not an expert on the CM, not even a novice yet; I haven't seen our department's machine and I don't have my copy of Hillis' book nearby, but it seems to me that the router does in fact perform the first task to a degree (the first few bits worth, anyway) and for nearby nodes, talking is cheap. On the second point, it seems to me that the CM is not quite as cumbersome at convolving an image as the MPP (comparable in individual node power as well as in number) would be if you tried to associate lisp nodes on it. The interconnect scheme makes a tremendous difference, but there is more flexibility in one direction than in the other. > Second, for applications which absolutely require the dense interconnect, >the processors should be made as large as possible and should NOT be simple 1- >bit machines... > For example, suppose I wish to build a complete system on a single wafer, >and I have 10,000,000,000 units of silicon. Suppose a simple processor with 1 >unit of processing power has an area of 1000, while a complex processor with 10 >units of processing power has an area of 100,000. Suppose the hypercube inter- >connect requires n*n area... > > In other words, even though each individual simple processor is 10 times >more efficient (1 processing unit per 1000 area units) than a complex processor >(10 processing units per 100,000 area units), the system built out of complex >processors is more powerful (600,000 processing units) than a system built of >simple processors (100,000 processing units). > > This result is not dependent on the numbers in the example, but is general. > > Ehud Reiter As you observe, the 2-D VLSI interconnect loses big to the 3-D brain interconnect. But wait! Why do we want VLSI interconnects? As your numbers show, the architecture that packs more gates per unit of silicon will have more power. In the wafer argument, you suppose (one) that putting everybody on one or two wafers is superior, and that (two) the interconnect size for a complex node does NOT increase in complexity over that for a simple node. (1) Having everybody on one wafer is advantageous only if the interconnect speed between processors on different chips is that much slower. However, by keeping tremendous numbers of processors on a single wafer, what speed you gain by throwing out bus drivers, backplanes and the like is clobbered by the speed you lose by not using 3-D interconnects in a hypercube architecture. In a 3-D world, if we suppose you have only 2,000,000 units of silicon available, for the simple processor model you can pack 1,000 processors (not far from realistic figures) but for the complex model, you may have fewer than 20. Where did the rest of the complex power go? Nowhere. The simple model caught up by disposing of idle 2-D interconnects. (2) In your example we now have a model of 10-bit processors with a bit-serial I/O path. Obviously, there is a bottleneck, and that bottleneck will cost you the load time for 10 bits, whereupon you end up with a very idle processor that acts like...60,000 simple ones. Or you can spend the extra complexity on silicon... -dave -- David Hsu Comm. & Signal Processing Lab, Systems Research Center /or/ "Him? No Comment." Systems Staff, Engineering Computer Facility, Dept. of EE -UMd The University of Maryland, College Park, MD 20742 ARPA: hsu@eneevax.umd.edu UUCP: [seismo,allegra]!umcp-cs!eneevax!hsu