Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!uwm.edu!psuvax1!rutgers!rochester!pt.cs.cmu.edu!gandalf.cs.cmu.edu!lindsay
From: lindsay@gandalf.cs.cmu.edu (Donald Lindsay)
Newsgroups: comp.arch
Subject: Re: Shared Memory
Summary: MIMD memory sharing can be inexpensive.
Message-ID: <10678@pt.cs.cmu.edu>
Date: 6 Oct 90 00:17:00 GMT
References: <1990Oct1.200613.635@tera.com> <1990Oct3.013509.1470@news.iastate.edu> <10651@pt.cs.cmu.edu> <11182@life.ai.mit.edu>
Organization: Carnegie-Mellon University, CS/RI
Lines: 61


In article <11181@life.ai.mit.edu> misha@teenage-mutant.ai.mit.edu 
	(Mike Bolotski) writes:
>While sharing at the page level across a network works, increasing
>the number of workstations to say, 1000, will seriously impede
>performance.

Well, yes.  However, I prefer to talk about the cost difference
between a large message-based MIMD machine, and a large shared-memory
MIMD machine.  I assume that the message-based machine is reasonably
well designed.  Hence, asking it to transport page-sized messages is
not unreasonable: at worst, the message traffic will be dissimilar to
the workload expected by the designers.

>>If a machine is to have finer-grained sharing, then a number of
>>efficiency issues come up.  I am aware of a variety of approaches,
>>each with its own penalty or price.  I believe that eventually, one
>>of these will be implemented in a way that adds essentially nothing
>>to the machine's manufacturing cost.

>I disagree.  Shared memory automatically creates contention
>for frequently used data.  Caching is a possible solution, but
>cache coherence protocols for fine grained machines become incredibly
>complex, and further impact performance.

As I pointed out, complexity does not necessarily make a machine more
expensive to build.  What costs are things like extra pins and wider
connectors.  A cache coherence protocol mostly involves adding
control hardware that can send and receive messages.  We were already
assuming some sort of hardware support for messages.

(If the enlarged cache line tags require extra SRAM chips, then
that's also a cost.  Also, message hardware can be semi-adequate
without having MMU access, whereas the sharing hardware definitely
talks with the MMU.)

>Further, shared memory does not obey the natural laws of physics,
>in a sense.  The abstract shared memory model pretends that there
>is a large number of communication paths to a single point. This
>simply isn't true, and additional hardware is required to provide
>this illusion to the programmer. This hardware costs time and
>dollars.

Message based systems offer the same pretense.  Any decent message-
based MIMD machine will allow a message to go quickly from any node,
to any other node.  Yes, there are nearest-neighbor designs out
there.  But they don't pretend to generality, and inevitably, library
software is written for forwarding messages.  The original NCUBE
machine was like this, but you will notice that the newer NCUBE-2 has
hardware support for forwarding.

>Parallel programs designed to reflect the underlying physical
>reality will operate faster than those that work in a virtual
>model. I suppose an IMHO is in order here.

IMHO you're right.  A parallel program, of either kind, ignores the
underlying truths at its peril.  The interconnect can have traffic
jams.  If 1023 nodes are all requesting decisions from a single node,
then everybody gets to do a lot of waiting.  And so on.
-- 
Don		D.C.Lindsay