Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!uwm.edu!psuvax1!rutgers!rochester!pt.cs.cmu.edu!gandalf.cs.cmu.edu!lindsay From: lindsay@gandalf.cs.cmu.edu (Donald Lindsay) Newsgroups: comp.arch Subject: Re: Shared Memory Summary: MIMD memory sharing can be inexpensive. Message-ID: <10678@pt.cs.cmu.edu> Date: 6 Oct 90 00:17:00 GMT References: <1990Oct1.200613.635@tera.com> <1990Oct3.013509.1470@news.iastate.edu> <10651@pt.cs.cmu.edu> <11182@life.ai.mit.edu> Organization: Carnegie-Mellon University, CS/RI Lines: 61 In article <11181@life.ai.mit.edu> misha@teenage-mutant.ai.mit.edu (Mike Bolotski) writes: >While sharing at the page level across a network works, increasing >the number of workstations to say, 1000, will seriously impede >performance. Well, yes. However, I prefer to talk about the cost difference between a large message-based MIMD machine, and a large shared-memory MIMD machine. I assume that the message-based machine is reasonably well designed. Hence, asking it to transport page-sized messages is not unreasonable: at worst, the message traffic will be dissimilar to the workload expected by the designers. >>If a machine is to have finer-grained sharing, then a number of >>efficiency issues come up. I am aware of a variety of approaches, >>each with its own penalty or price. I believe that eventually, one >>of these will be implemented in a way that adds essentially nothing >>to the machine's manufacturing cost. >I disagree. Shared memory automatically creates contention >for frequently used data. Caching is a possible solution, but >cache coherence protocols for fine grained machines become incredibly >complex, and further impact performance. As I pointed out, complexity does not necessarily make a machine more expensive to build. What costs are things like extra pins and wider connectors. A cache coherence protocol mostly involves adding control hardware that can send and receive messages. We were already assuming some sort of hardware support for messages. (If the enlarged cache line tags require extra SRAM chips, then that's also a cost. Also, message hardware can be semi-adequate without having MMU access, whereas the sharing hardware definitely talks with the MMU.) >Further, shared memory does not obey the natural laws of physics, >in a sense. The abstract shared memory model pretends that there >is a large number of communication paths to a single point. This >simply isn't true, and additional hardware is required to provide >this illusion to the programmer. This hardware costs time and >dollars. Message based systems offer the same pretense. Any decent message- based MIMD machine will allow a message to go quickly from any node, to any other node. Yes, there are nearest-neighbor designs out there. But they don't pretend to generality, and inevitably, library software is written for forwarding messages. The original NCUBE machine was like this, but you will notice that the newer NCUBE-2 has hardware support for forwarding. >Parallel programs designed to reflect the underlying physical >reality will operate faster than those that work in a virtual >model. I suppose an IMHO is in order here. IMHO you're right. A parallel program, of either kind, ignores the underlying truths at its peril. The interconnect can have traffic jams. If 1023 nodes are all requesting decisions from a single node, then everybody gets to do a lot of waiting. And so on. -- Don D.C.Lindsay