Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!gatech!hubcap!prins
From: prins@prins.cs.unc.edu (Jan Prins)
Newsgroups: comp.parallel
Subject: Re: SIMD Extentions
Message-ID: <5903@hubcap.clemson.edu>
Date: 3 Jul 89 13:40:43 GMT
Sender: fpst@hubcap.clemson.edu
Lines: 58
Approved: parallel@hubcap.clemson.edu

In article <5886@hubcap.clemson.edu>, jps@cat.cmu.edu (James Salsman) writes:

> I really hate having to deal with indirect addressing on
> most SIMD machines.  I wish someone would build a SIMD array
> using PE's with address buffers.   [...]

Early proposals for SIMD parallel computers included indirect
addressing.  But when you build a massively parallel processor, the
wiring to bring 65K (or however many) individual addresses out to the
memories from the PEs is daunting.

> The way I would hack an address buffer in to the CM is by
> employing a shift register added to each PE.
> 
>   (1) Add a new nanoinstruction pin [or two] that selects
>       memory input to the ALU between "Address A [or B]"
>       from the instruction pins and the contents of the
>       indirection register.
> 
>   (2) Add a new nanoinstruction pin that causes the output
>       from the ALU to be shifted into the indirection register.
> 
> That's all there is to it.  Two or three new pins, a shift
> register, and PE memory indirection takes 13 nanocycles
> instead of a zillion.

Where would this shift register reside?  If it is on chip with the PEs,
then you suddenly have a lot of extra address lines to bring off chip
-- with 16 PEs, and 16 bits of local addressing that amounts to 256
extra wires!  If the register is off-chip, say in the memory, then you
can fill it bit-serially without extra wires but you need logic to use
it as an address.  I was under the impression that TMC used standard
memory parts (or was that only for the CM-1?), so the latter approach
would be extremely cumbersome in that setting.
 
> I am sure that a similar thing could be done to other SIMD
> architectures.  [...]

There are examples of massively-parallel SIMD architectures that
support indirect addressing.  One of them is BLITZEN, an extension of
MPP that permits the contents of the PE shift register to be used as a
local modification to the global address.  The wiring problem is solved
by placing PEs and memories on the same chip, although this approach
limits the size of local memory so that very fast I/O to external
memory is required.  The current BLITZEN design places 128 PEs, each
with 1K of local memory, per chip.


> :James P. Salsman (jps@CAT.CMU.EDU)

Jan Prins (prins@cs.unc.edu)
Dept. of Computer Science 
UNC - Chapel Hill

Blevins, Davis, Heaton, Reif "BLITZEN:  A Highly Integrated Massively 
  Parallel Machine", Frontiers Mass. Par. Comp. 1988.

Heaton, Blevins "BLITZEN: A VLSI Array Processing Chip", IEEE CICC 1989.