Path: utzoo!attcan!uunet!husc6!rice!titan!retrac
From: retrac@titan.rice.edu (John Carter)
Newsgroups: comp.arch
Subject: Re: Is Shared Memory Necessary?
Message-ID: <685@thalia.rice.edu>
Date: 10 May 88 16:31:01 GMT
References: <503@xios.XIOS.UUCP> <2676@pdn.UUCP> <674@cernvax.UUCP> <9559@sol.ARPA>
Sender: usenet@rice.edu
Reply-To: retrac@rice.edu (John Carter)
Organization: Rice University, Houston
Lines: 39

In article <9559@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes:
>In article <674@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes:
>>Surely the highest bandwidth is achieved
>>when each processor has its own memory which it shares with noone else?  It
>>also makes the hardware a lot smaller. ...  shared-memory is not necessary;
>>it's a software issue that shouldn't be solved in hardware.
>
>Yes, the highest bandwidth is achieved when when each processor has exclusive
>memory.  However, processes on different processors must still communicate with
>each other.  Non-shared memory communication typically costs two orders of
>magnitude more than shared memory communication.  What's worse, even when
>processes are on the same processor, software engineering issues often require
>that they communicate via the same slow inter-processor mechanism.  So the
>fastest time to solve a problem may well lie with shared memory even though its
>bandwidth is lower.  Communication is a software issue.  Making it cheap
>requires hardware.  My conclusion is that shared-memory is not necessary, but
>that it is worth its cost.

I think that it's currently worth the cost, but I doubt seriously that it'll
be a realistic approach as we try to increase the number of processors beyond
several dozen.  A shared (hardware) memory that needed to handle thousands of
processors would be much slower than a conventional memory (how much slower
would depend on the actual architectute and what decisions you made about the
semantics of memory access).  The RP-3 project at IBM is doing some interest-
ing work on large shared memory architectures.  Their work aside, I think that
you should be able to get "optimal" performance by providing very fast
dsistributed memory communication, hardward support for IPC (interprocess[or]
communication), intelligent/clever communication protocols, and intelligent
compilers that can schedule the IPC so as to remove or reduce the delay
associated with waiting for remote memory access.  Kai Li (at Princeton,
I believe) has done some very good work on implementing shared memory
on top of a distributed memory machine/system (i.e., the architecture is
distributed memory, but the programmer's view if of shared memory).


John Carter               Internet: retrac@rice.edu
Dept of Computer Science  CSNET:    retrac@rice.edu
Rice University           UUCP:     {internet node or backbone}!rice!retrac
Houston, TX