Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!cs.utexas.edu!rice!uw-beaver!uw-june!rik From: rik@cs.washington.edu (Rik Littlefield) Newsgroups: comp.arch Subject: Re: Shared Memory vs. Distributed Systems Keywords: Shared, distributed, message Message-ID: <9605@june.cs.washington.edu> Date: 26 Oct 89 18:46:33 GMT References: <20764@usc.edu> <1646@ncrcce.StPaul.NCR.COM> Organization: U of Washington, Computer Science, Seattle Lines: 57 In article <1646@ncrcce.StPaul.NCR.COM>, pasek@ncrcce.StPaul.NCR.COM (Michael A. Pasek) writes: [about the transputer] > Although you describe this as "message passing" between the computers, > it sounds to me like you have one "global" memory which is controlled by the > "dedicated DMA machine", and this memory is "shared" by all the "local" > processors. What difference does it make whether you set some register in > your "dedicated DMA machine" and tell it to move some data to another > location, or just set an address latch in your micro somewhere and do a > "store" instruction ? > It matters a great deal, both in raw performance and in how you program the beasts. On a transputer, 10 microseconds is what, 30 instructions? Say you're in a tiny network, average 3 hops each out and back. That's 6 hops, 180 instructions latency. We can quibble about the numbers, but the point is, it's a long time. What this means is that it's dangerous to think "when I need this datum, I'll just go get it". That programming model works OK on the "shared memory" machines, which have much lower latency. On a transputer array, it gives out quickly as you try programs that do progressively more sharing. The same thing happens on any other existing distributed memory machine. Of course, as several posters have pointed out, if each node has enough things to do, you may be able to hide the latency. Perhaps a more fundamental difference is that shared memory machines provide hardware to guarantee that all processors have a consistent view of memory. Cooperation between processors can be controlled just by synchronizing. There's no need to explicitly update other processors' copies of shared data. With the distributed memory machines, no processor will find out about an update until it asks or another one tells it. The programmer can be insulated from this necessity by the compiler and/or runtime system, as with Kai Li's shared virtual memory (why isn't it "virtual shared memory"?), but the performance penalty can be severe if you're unlucky about the sharing patterns. It looks to me like the distributed memory machines are pretty good at running three kinds of programs. First are those that just don't require much write-sharing. Second are those with communication patterns that can be mapped to nearest-neighbor links, assuming that you're lucky enough to have a low-overhead machine like the transputer. Third are those in which the latency can be hidden and overhead amortized by handling lots of independent store/fetch requests at once. (Combinations of the above are best of all ;-) Using that third approach, there seem to be quite a few applications that can run well even with high communications overhead. (See Fox's book, "Solving Problems on Concurrent Processors".) But lack of software support is a big problem. Programs like that can be written using explicit message passing, but it's not easy. Several people, including myself, are working on compilation and runtime techniques to let you write programs using a shared memory data model, but have them execute efficiently on a message passer. It's not an easy problem (Ph.D. thesis material), but there are several promising approaches. Stay tuned, but don't hold your breath. --Rik