Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!gatech!hubcap!prins From: prins@prins.cs.unc.edu (Jan Prins) Newsgroups: comp.parallel Subject: Re: SIMD Extentions Message-ID: <5903@hubcap.clemson.edu> Date: 3 Jul 89 13:40:43 GMT Sender: fpst@hubcap.clemson.edu Lines: 58 Approved: parallel@hubcap.clemson.edu In article <5886@hubcap.clemson.edu>, jps@cat.cmu.edu (James Salsman) writes: > I really hate having to deal with indirect addressing on > most SIMD machines. I wish someone would build a SIMD array > using PE's with address buffers. [...] Early proposals for SIMD parallel computers included indirect addressing. But when you build a massively parallel processor, the wiring to bring 65K (or however many) individual addresses out to the memories from the PEs is daunting. > The way I would hack an address buffer in to the CM is by > employing a shift register added to each PE. > > (1) Add a new nanoinstruction pin [or two] that selects > memory input to the ALU between "Address A [or B]" > from the instruction pins and the contents of the > indirection register. > > (2) Add a new nanoinstruction pin that causes the output > from the ALU to be shifted into the indirection register. > > That's all there is to it. Two or three new pins, a shift > register, and PE memory indirection takes 13 nanocycles > instead of a zillion. Where would this shift register reside? If it is on chip with the PEs, then you suddenly have a lot of extra address lines to bring off chip -- with 16 PEs, and 16 bits of local addressing that amounts to 256 extra wires! If the register is off-chip, say in the memory, then you can fill it bit-serially without extra wires but you need logic to use it as an address. I was under the impression that TMC used standard memory parts (or was that only for the CM-1?), so the latter approach would be extremely cumbersome in that setting. > I am sure that a similar thing could be done to other SIMD > architectures. [...] There are examples of massively-parallel SIMD architectures that support indirect addressing. One of them is BLITZEN, an extension of MPP that permits the contents of the PE shift register to be used as a local modification to the global address. The wiring problem is solved by placing PEs and memories on the same chip, although this approach limits the size of local memory so that very fast I/O to external memory is required. The current BLITZEN design places 128 PEs, each with 1K of local memory, per chip. > :James P. Salsman (jps@CAT.CMU.EDU) Jan Prins (prins@cs.unc.edu) Dept. of Computer Science UNC - Chapel Hill Blevins, Davis, Heaton, Reif "BLITZEN: A Highly Integrated Massively Parallel Machine", Frontiers Mass. Par. Comp. 1988. Heaton, Blevins "BLITZEN: A VLSI Array Processing Chip", IEEE CICC 1989.