Path: utzoo!attcan!uunet!husc6!bloom-beacon!gatech!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!hankd
From: hankd@pur-ee.UUCP (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: Is Shared Memory Necessary?
Summary: Some BIG shared memory address space machines
Message-ID: <8125@pur-ee.UUCP>
Date: 17 May 88 16:47:47 GMT
References: <503@xios.XIOS.UUCP> <2676@pdn.UUCP> <674@cernvax.UUCP> <685@thalia.rice.edu>
Organization: Purdue University Engineering Computer Network
Lines: 95

In article <685@thalia.rice.edu>, retrac@titan.rice.edu (John Carter) writes:
> In article <9559@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes:
> >In article <674@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes:
> >>Surely the highest bandwidth is achieved
> >>when each processor has its own memory which it shares with noone else?  It
> >>also makes the hardware a lot smaller. ...  shared-memory is not necessary;
> >>it's a software issue that shouldn't be solved in hardware.
> >
> >Yes, the highest bandwidth is achieved when when each processor has exclusive
> >memory.  However, processes on different processors must still communicate with
> >each other.  Non-shared memory communication typically costs two orders of
> >magnitude more than shared memory communication.  What's worse, even when
> >processes are on the same processor, software engineering issues often require
> >that they communicate via the same slow inter-processor mechanism.  So the
...
> several dozen.  A shared (hardware) memory that needed to handle thousands of
> processors would be much slower than a conventional memory (how much slower
> would depend on the actual architectute and what decisions you made about the
> semantics of memory access).  The RP-3 project at IBM is doing some interest-
> ing work on large shared memory architectures.  Their work aside, I think that
...
> associated with waiting for remote memory access.  Kai Li (at Princeton,
> I believe) has done some very good work on implementing shared memory
> on top of a distributed memory machine/system (i.e., the architecture is
> distributed memory, but the programmer's view if of shared memory).

The following is a list, in approximate chronological order, of very large
scale shared memory address space (but physically distributed memory) MIMD
machines which have been proposed, simulated, and/or built:

CHoPP		Columbia university HOmogeneous Parallel Processor; a machine
		with a LogN stage smart switch network between processors and
		memory modules.  Each switch contains a pre- and post-arrival
		cache called an RFM:  Repetition Filter Memory.  Multiple
		requests for (at least temporarily) read-only items are
		combined (serviced by the cache, rather than by memory) at
		the point in the net where their paths cross.

Denelcor HEP	Uses microtasking to hide shared memory latency.

NYU Ultra	Instead of RFMs, uses F&O (Fetch and Op) smart switches,
		which can combine semaphore operations in the net, but only
		if they actually collide in a particular switch node.  Does
		nothing about read-only objects, however, it also has a
		local memory for each processor.  F&O happens to be really
		good at doing associative reductions, but not all that good
		at combining semaphore ops because in a MIMD they rarely
		collide (rather, they usually cross paths).

BBN Butterfly	A "dumb" LogN stage network is used, but it is much faster
		than the 68K-family processors of the machine, so it works.

IBM RP3		Research Parallel Processing Prototype; originally to be the
		NYU machine "done right."  Memory is physically local to
		processors but globally addressible, with hardware support
		for address mapping.  Was to have two networks:  slow F&O
		and fast circuit switching...  but only the fast one was
		implemented due to cost constraints.

RFM-MIMD	Son-of-CHoPP.  This machine differs in that it uses RFM+
		switches, which can combine both read and write requests
		which cross paths, and it uses only a single stage net and
		memory modules are local to processor nodes, yet globally
		addressable.  Further, the processor nodes of the MIMD
		constitute VLIW machines tailored to hide occasionally large
		memory latencies.

Cedar		From University of Illinois...  a bunch of Alliants sharing
		a fast global store.  Uses a cluster structure for memory
		interconnection.  Not really as scalable as the others....

BBM Monarch	Next-generation of the Butterfly?

CARP Machine	Compiler-oriented Architecture Research at Purdue Machine;
		vaguely son-of-RFM-MIMD and son-of-PASM.  Lots of compiler
		driven "tricks" used to manage memory....

Horizon/Tera	Burton Smith's new machine (HEP being his old one).  Sort-of
		like RFM-MIMD, but with microtasking and with a mesh net.
		Details not yet available.

#ifdef	FLAME
		Don't you guys read the earlier postings before posting your
		responses?  I posted something a couple of weeks ago that
		would have cleared-up most of the argument, but you guys
		just seem to want to argue, rather than to discuss.
#endif

     __         /|
  _ |  |  __   / |  Compiler-oriented
 /  |--| |  | |  |  Architecture
/   |  | |__| |_/   Researcher from
\__ |  | | \  |     Purdue
    \    |  \  \
	 \      \   Prof. Hank Dietz, Purdue EE