Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!decvax!cca!ima!pbear!peterb From: peterb@pbear.UUCP Newsgroups: net.arch Subject: Re: Cube designs Message-ID: <48@pbear.UUCP> Date: Sun, 17-Feb-85 02:35:30 EST Article-I.D.: pbear.48 Posted: Sun Feb 17 02:35:30 1985 Date-Received: Wed, 20-Feb-85 07:31:07 EST Lines: 72 Nf-ID: #N:pbear:22800001:000:4098 Nf-From: pbear!peterb Feb 16 01:44:00 1985 I have not looked deeply into the design of the 'Cube' as people have called it. (i.e. taking a factor of n**3 CPU/MEM and hooking them up in parrellel to accomplish a given task), so bear with me. First of all, rating the speed on any highly parallel system is difficult in the least. You have to take your benchmarks in stride. If I have a matrix problem that can be decomposed into two semiindependent processes, then a VAX/11-782 would execute that program about twice as fast as a VAX/11-780. But on the other hand if the program to be benchmarked is highly sequential in nature (i.e. nth order numerical analysis of differential equations) then the 782 and the 780 are going to run at about the same speed. This applies to any parallel architecture. So a new standard of speed measure is required. I think that something along the lines of Data Flow Operations/Second (DFO's -or- Doofoh's) would fit the bill to benchmark these types of machines. Then if you take the Cube and the Cray and put them both on the same scale that reflects the architecures then you can compare speeds, otherwise you are comparing apples to oranges. Second, any type of parallel machine relies heavily upon the distrubution of data from one machine to another. This figures into the overall speed of the machine since it is the exchange of data between computing devices that drives any type of parallel architecture. This can be a high/low speed type of architecture such as an ethernet(serial) or a backplane(parallel) or even a combination of the two(i.e. eight processing units on a backplane with a serial line to connect to other backplanes). This was proved by Cm* created at CMU. Their data showed a severe OVERALL system data transfer degradation as the amount of non/local I/O increased. This is obvious to almost everybody. Cm* was limited only by the speed of its backplane. Third, some type of control facility has to run the entire mess. This can be slower than the other elements since it does not require the massive data troughput of a processing element but still must have a clean/quick architecture that lends itself to controling "devices" in a quick and clean manner. Some form of overgrown/homebrew bit-slice seems optimal in this situation since some instructions have to be general enough for scheduling algorithms, processing I/O, feilding interupts, etc... but quick and clean enough to service the resource request of each data element. Whether this control facility is distributed or singular is up in debate these days. Different groups have differenet ideas regarding this. The idea of a cube is nice, and I think that it is about the fastest architecture around for what it is designed for, but in no way will it compete with a Cray at sequential MIPS. In parallel MIPS the cube would have to be large, but the size would be managable. In order to increase/control data throughput, I think that a bus architecure that combines the best of serial and parallel is in order. I think that each processing unit be hooked to three busses, one for the X direction, one for the Y direction, and one for the Z direction. This would require 3n**2 (n = size of cube on one side) busses each with n elements on it. (i.e. for 8 data elements(a cube 2 on each side) requires 4 busses in the X direction, 4 busses in the Y direction and 4 busses in the Z direction(giving a total of 12 data busses). There would be a total of 3n**2 buss connections within the cube. But the advantage of this is that data (at the most) has to pass through one data element on its way from source to destination. Other paths can be created to get the data from node to node, especially if each buss connection had a fifo on it to queue up transfers. Also the data element can pass the information along from one buss to another with very little overhead. I know this rough, but if the net kicks around the idea, we may all one day(as a collective group) file for a patent (but I doubt it...) Peter Barada ima!pbear!peterb PS "its a long day, and it ain't going any faster..."