Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!gatech!hubcap!rjc From: rjc@CS.UCLA.EDU (Robert Collins) Newsgroups: comp.parallel Subject: SIMD (CM2) indirect addressing (was SIMD Extentions) Summary: CM2 has *fast* indirect addressing Message-ID: <5902@hubcap.clemson.edu> Date: 3 Jul 89 13:40:29 GMT Sender: fpst@hubcap.clemson.edu Lines: 91 Approved: parallel@hubcap.clemson.edu First of all, I don't work for TMC, so this is totally unofficial. I simply have had *lots* of experience hacking on the CM2. I don't use *lisp, so I can't give details on *lisp function names... I am posting rather than responding via e-mail so the CM2 doesn't get a bum rap (and a bad reputation) from inexperienced users. In article <5886@hubcap.clemson.edu> jps@cat.cmu.edu (James Salsman) writes: >I really hate having to deal with indirect addressing on >most SIMD machines. I wish someone would build a SIMD array >using PE's with address buffers. Just one tiny address >buffer per processor is all I want... nothing fancy. As >long as *ALL* the memory addresses have to come over the >global instruction stream and thus are the *SAME* for each >element, a lot of potential processing power is going to >waste! > >For example, on the Connection Machine in *Lisp, indirect >aref!!'s take FOR EVER. This is SERIOUSLY slowing down the >Production System that I wrote in *Lisp (regardless, it's >faster than CMU/Soar's Production System Machine or any >other implementation of a production system that I've heard >about.) TMC has added somthing called "sideways arrays" to >help indirect addressing, but the *Lisp manual is totally >obscure (so what else is new) and from what I can tell, it >looks like "sideways" means "spread out over several >physical processors." Ack/Pft! > [ ... ] I'd tell you to RTFM, but I realize TFM can be hard to R (and understand)! :-) Yes, your basic CM array reference takes a long time. Everyone gets bitten by this one :-). It runs in time O(n), where n is the **number of elements in your array**. This is *bad* for big arrays. the aref!! function does not do any actual indirect addressing. What it does is loop through all indicies in your array. In each iteration, it turns on every processor that has an index equal to the loop index, does a move operation, and goes onto the next element of the array. With the release of version 5.0 of the CM system software, we now have `fast' indirect addressing. It is performed by the chip that does serial to parallel conversion (for feeding the FPA chips). Since this hardware works on 32-bit chuncks (and there is one for every 32 processing elements), you are stuck with fast arrays with 32 bit elements (or multiples of 32, I guess). This is a small price to pay for the speed. Using the fast (sideways) arrays, an array reference takes on the order of 2-3 times as long as a 32-bit in-processor move. They are sometimes called `sideways' arrays, because each element in the per-processor arrays is spread across the memory of 32 processors (so they can be accessed in parallel). If you have a lookup table that is the same for all processors (or at least in groups of 32), you can minimize your memory requirements by physically sharing the array. James, I am sure that TMC's customer support group will be able to give you details about how to access both fast arrays and fast shared arrays from *lisp. Their email address is csg@think.com. I completely agree that indirect addressing is an extremely important aspect of a SIMD architecture. In fact, I have yet to write a program for the CM that didn't beg for indirect addressing. Although TMC perhaps could have handled indirect addressing in a more general way, I feel that their efforts in this direction have been quite adequate. I have done some pretty bizarre things with the CM2, and it has been a great piece of hardware. My first project was to study the performance issues involved in the emulation of MIMD-style computation on the CM (a SIMD machine). This type of emulation is critically dependent on indirect addressing. By the way, I realize that MIMD on the CM is a really brain damaged thing to do, but it works *very* well. I still haven't figured out many MIMD applications that can use a million or more processors :-). I am convinced that the reverse (emulation of a SIMD machine on a MIMD machine) would be much more difficult and orders of magnitude slower. My latest project has been the simulation of populations of artificial animals (yes, Artificial Life). This is also quite dependent on indirect addressing. There is no way that an existing MIMD computer can match the CM for this type of application! When I first started working on the CM, I had trouble thinking of ways to use all those processors. Now, I have trouble thinking of ways to make my problems small enough to fit on the CM! Oh well.... I hope this has helped to clear up the confusion about indirect addressing on the CM. Rob ------------------------------------------------------------------------------- rjc@cs.ucla.edu C++/Paris on the CM2: Object Oriented SIMD madness -------------------------------------------------------------------------------