Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!gatech!hubcap!rjc
From: rjc@CS.UCLA.EDU (Robert Collins)
Newsgroups: comp.parallel
Subject: SIMD (CM2) indirect addressing (was SIMD Extentions)
Summary: CM2 has *fast* indirect addressing
Message-ID: <5902@hubcap.clemson.edu>
Date: 3 Jul 89 13:40:29 GMT
Sender: fpst@hubcap.clemson.edu
Lines: 91
Approved: parallel@hubcap.clemson.edu

First of all, I don't work for TMC, so this is totally unofficial.  I
simply have had *lots* of experience hacking on the CM2.  I don't
use *lisp, so I can't give details on *lisp function names...

I am posting rather than responding via e-mail so the CM2 doesn't
get a bum rap (and a bad reputation) from inexperienced users.

In article <5886@hubcap.clemson.edu> jps@cat.cmu.edu (James Salsman) writes:
>I really hate having to deal with indirect addressing on
>most SIMD machines.  I wish someone would build a SIMD array
>using PE's with address buffers.  Just one tiny address
>buffer per processor is all I want... nothing fancy.  As
>long as *ALL* the memory addresses have to come over the
>global instruction stream and thus are the *SAME* for each
>element, a lot of potential processing power is going to
>waste!
>
>For example, on the Connection Machine in *Lisp, indirect
>aref!!'s take FOR EVER.  This is SERIOUSLY slowing down the
>Production System that I wrote in *Lisp (regardless, it's
>faster than CMU/Soar's Production System Machine or any
>other implementation of a production system that I've heard
>about.)  TMC has added somthing called "sideways arrays" to
>help indirect addressing, but the *Lisp manual is totally
>obscure (so what else is new) and from what I can tell, it
>looks like "sideways" means "spread out over several
>physical processors."  Ack/Pft!
> [ ... ]

I'd tell you to RTFM, but I realize TFM can be hard to R (and
understand)! :-)

Yes, your basic CM array reference takes a long time.  Everyone gets
bitten by this one :-).  It runs in time O(n), where n is the
**number of elements in your array**.  This is *bad* for big arrays.
the aref!! function does not do any actual indirect addressing.  What
it does is loop through all indicies in your array.  In each iteration,
it turns on every processor that has an index equal to the loop index,
does a move operation, and goes onto the next element of the array.

With the release of version 5.0 of the CM system software, we now have
`fast' indirect addressing.  It is performed by the chip that does
serial to parallel conversion (for feeding the FPA chips).  Since
this hardware works on 32-bit chuncks (and there is one for every
32 processing elements), you are stuck with fast arrays with
32 bit elements (or multiples of 32, I guess).  This is a small price
to pay for the speed.  Using the fast (sideways) arrays, an array
reference takes on the order of 2-3 times as long as a 32-bit in-processor
move.  They are sometimes called `sideways' arrays, because each
element in the per-processor arrays is spread across the memory of
32 processors (so they can be accessed in parallel).
If you have a lookup table that is the same for all processors (or at
least in groups of 32), you can minimize your memory requirements
by physically sharing the array.

James, I am sure that TMC's customer support group will be able to
give you details about how to access both fast arrays and fast
shared arrays from *lisp.  Their email address is csg@think.com.

I completely agree that indirect addressing is an extremely important
aspect of a SIMD architecture.  In fact, I have yet to write a
program for the CM that didn't beg for indirect addressing.

Although TMC perhaps could have handled indirect addressing in a more
general way, I feel that their efforts in this direction have been
quite adequate.

I have done some pretty bizarre things with the CM2, and it has been
a great piece of hardware.  My first project was to study the 
performance issues involved in the emulation of MIMD-style computation
on the CM (a SIMD machine).  This type of emulation is critically
dependent on indirect addressing.  By the way, I realize that MIMD on
the CM is a really brain damaged thing to do, but it works *very* well.
I still haven't figured out many MIMD applications that can use a million
or more processors :-).  I am convinced that the reverse (emulation of a
SIMD machine on a MIMD machine) would be much more difficult and orders
of magnitude slower.  My latest project has been the simulation of
populations of artificial animals (yes, Artificial Life).  This is also
quite dependent on indirect addressing.  There is no way that an existing
MIMD computer can match the CM for this type of application!  When I first
started working on the CM, I had trouble thinking of ways to use all those
processors.  Now, I have trouble thinking of ways to make my problems
small enough to fit on the CM!  Oh well....

I hope this has helped to clear up the confusion about indirect
addressing on the CM.

Rob
-------------------------------------------------------------------------------
rjc@cs.ucla.edu	            C++/Paris on the CM2:  Object Oriented SIMD madness
-------------------------------------------------------------------------------