Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!pur-ee!hankd
From: hankd@pur-ee.UUCP (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: Micro 2000
Message-ID: <13062@pur-ee.UUCP>
Date: 5 Oct 89 15:25:57 GMT
References: <6415@pt.cs.cmu.edu>
Reply-To: hankd@pur-ee.UUCP (Hank Dietz)
Organization: Purdue University Engineering Computer Network
Lines: 58

In article <6415@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>It is rumored that multiple-CPU chips really are in the works, and
>for 1991, not for 2000.  (No, I don't mean university projects, or
>published designs such as TRAC.)
>
>If they are going to happen, what would people like to see?  How
>should the units communicate?  What about internal vs external
>interrupts; MMU[s]; local caches; semaphores; hardware forks?  
>Any shared-register fans out there?

As an academic optimizing/parallelizing compiler researcher, it has long
been apparent to me that the way to build big machines is as MIMD machines
with physically-distributed memory (accessed by a shared address space) such
that each node in the MIMD is actually a single-chip VLIW or other fine-grain
processor arrangement.

The first such design I proposed was presented at the Second SIAM Conference
on Parallel Processing for Scientific Computing, November 20, 1985: Henry G.
Dietz and A. David Klappholz, "RISC CPU Design for MIMDs."

The current design is quite different -- the CARP (Compiler-oriented
Architecture Research at Purdue) machine.  We haven't got any details out in
papers yet, but we have a fairly complete paper design (also a simulator,
assembler, and compilers working for at least parts of the machine).  As soon
as we finish THE CARP machine technical report, I'll post the reference so
that you can all get copies...  but here is a quick summary:

Full machine:
	A MIMD with ~64 nodes

Node:
	One Chip (min. of 100K trans. equiv.) containing at least:
		4 32-bit Integer Processors (IPs), each with:
			18 instructions, 8 bits explicit control
			16 data CRegs (Cache-REGisters)
			16 4-word instruction CRegs
			Some CRegs shared with other IPs
			Separate global/local memory access busses
			No interrupts allowed (delayed for free Node or IP)
		"Smart cache" network interface (based on RFM+)
		Barrier sync. processor
			(Allows asynchronous execution, but also
			lets compiler use static VLIW-like scheduling)
		1 64-bit Float Processor (FP), with:
			~2 instructions (mul & recip)
			General layout as IPs...

	Memory mgmt:
		Virtual memory paging (pages are local to node)
		Some explicit control over page handling
		I/O is memory-mapped

Network:
	Topology...  probably single-stage (recirc.) net, but not certain
	Switches...  within nodes...  explicit control of caching
	MIMD inter-node sync. as semaphores in network

						-hankd@ecn.purdue.edu