Path: utzoo!attcan!uunet!husc6!bloom-beacon!gatech!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Re: Is Shared Memory Necessary? Summary: Some BIG shared memory address space machines Message-ID: <8125@pur-ee.UUCP> Date: 17 May 88 16:47:47 GMT References: <503@xios.XIOS.UUCP> <2676@pdn.UUCP> <674@cernvax.UUCP> <685@thalia.rice.edu> Organization: Purdue University Engineering Computer Network Lines: 95 In article <685@thalia.rice.edu>, retrac@titan.rice.edu (John Carter) writes: > In article <9559@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes: > >In article <674@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes: > >>Surely the highest bandwidth is achieved > >>when each processor has its own memory which it shares with noone else? It > >>also makes the hardware a lot smaller. ... shared-memory is not necessary; > >>it's a software issue that shouldn't be solved in hardware. > > > >Yes, the highest bandwidth is achieved when when each processor has exclusive > >memory. However, processes on different processors must still communicate with > >each other. Non-shared memory communication typically costs two orders of > >magnitude more than shared memory communication. What's worse, even when > >processes are on the same processor, software engineering issues often require > >that they communicate via the same slow inter-processor mechanism. So the ... > several dozen. A shared (hardware) memory that needed to handle thousands of > processors would be much slower than a conventional memory (how much slower > would depend on the actual architectute and what decisions you made about the > semantics of memory access). The RP-3 project at IBM is doing some interest- > ing work on large shared memory architectures. Their work aside, I think that ... > associated with waiting for remote memory access. Kai Li (at Princeton, > I believe) has done some very good work on implementing shared memory > on top of a distributed memory machine/system (i.e., the architecture is > distributed memory, but the programmer's view if of shared memory). The following is a list, in approximate chronological order, of very large scale shared memory address space (but physically distributed memory) MIMD machines which have been proposed, simulated, and/or built: CHoPP Columbia university HOmogeneous Parallel Processor; a machine with a LogN stage smart switch network between processors and memory modules. Each switch contains a pre- and post-arrival cache called an RFM: Repetition Filter Memory. Multiple requests for (at least temporarily) read-only items are combined (serviced by the cache, rather than by memory) at the point in the net where their paths cross. Denelcor HEP Uses microtasking to hide shared memory latency. NYU Ultra Instead of RFMs, uses F&O (Fetch and Op) smart switches, which can combine semaphore operations in the net, but only if they actually collide in a particular switch node. Does nothing about read-only objects, however, it also has a local memory for each processor. F&O happens to be really good at doing associative reductions, but not all that good at combining semaphore ops because in a MIMD they rarely collide (rather, they usually cross paths). BBN Butterfly A "dumb" LogN stage network is used, but it is much faster than the 68K-family processors of the machine, so it works. IBM RP3 Research Parallel Processing Prototype; originally to be the NYU machine "done right." Memory is physically local to processors but globally addressible, with hardware support for address mapping. Was to have two networks: slow F&O and fast circuit switching... but only the fast one was implemented due to cost constraints. RFM-MIMD Son-of-CHoPP. This machine differs in that it uses RFM+ switches, which can combine both read and write requests which cross paths, and it uses only a single stage net and memory modules are local to processor nodes, yet globally addressable. Further, the processor nodes of the MIMD constitute VLIW machines tailored to hide occasionally large memory latencies. Cedar From University of Illinois... a bunch of Alliants sharing a fast global store. Uses a cluster structure for memory interconnection. Not really as scalable as the others.... BBM Monarch Next-generation of the Butterfly? CARP Machine Compiler-oriented Architecture Research at Purdue Machine; vaguely son-of-RFM-MIMD and son-of-PASM. Lots of compiler driven "tricks" used to manage memory.... Horizon/Tera Burton Smith's new machine (HEP being his old one). Sort-of like RFM-MIMD, but with microtasking and with a mesh net. Details not yet available. #ifdef FLAME Don't you guys read the earlier postings before posting your responses? I posted something a couple of weeks ago that would have cleared-up most of the argument, but you guys just seem to want to argue, rather than to discuss. #endif __ /| _ | | __ / | Compiler-oriented / |--| | | | | Architecture / | | |__| |_/ Researcher from \__ | | | \ | Purdue \ | \ \ \ \ Prof. Hank Dietz, Purdue EE