Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Re: Micro 2000 Message-ID: <13062@pur-ee.UUCP> Date: 5 Oct 89 15:25:57 GMT References: <6415@pt.cs.cmu.edu> Reply-To: hankd@pur-ee.UUCP (Hank Dietz) Organization: Purdue University Engineering Computer Network Lines: 58 In article <6415@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: >It is rumored that multiple-CPU chips really are in the works, and >for 1991, not for 2000. (No, I don't mean university projects, or >published designs such as TRAC.) > >If they are going to happen, what would people like to see? How >should the units communicate? What about internal vs external >interrupts; MMU[s]; local caches; semaphores; hardware forks? >Any shared-register fans out there? As an academic optimizing/parallelizing compiler researcher, it has long been apparent to me that the way to build big machines is as MIMD machines with physically-distributed memory (accessed by a shared address space) such that each node in the MIMD is actually a single-chip VLIW or other fine-grain processor arrangement. The first such design I proposed was presented at the Second SIAM Conference on Parallel Processing for Scientific Computing, November 20, 1985: Henry G. Dietz and A. David Klappholz, "RISC CPU Design for MIMDs." The current design is quite different -- the CARP (Compiler-oriented Architecture Research at Purdue) machine. We haven't got any details out in papers yet, but we have a fairly complete paper design (also a simulator, assembler, and compilers working for at least parts of the machine). As soon as we finish THE CARP machine technical report, I'll post the reference so that you can all get copies... but here is a quick summary: Full machine: A MIMD with ~64 nodes Node: One Chip (min. of 100K trans. equiv.) containing at least: 4 32-bit Integer Processors (IPs), each with: 18 instructions, 8 bits explicit control 16 data CRegs (Cache-REGisters) 16 4-word instruction CRegs Some CRegs shared with other IPs Separate global/local memory access busses No interrupts allowed (delayed for free Node or IP) "Smart cache" network interface (based on RFM+) Barrier sync. processor (Allows asynchronous execution, but also lets compiler use static VLIW-like scheduling) 1 64-bit Float Processor (FP), with: ~2 instructions (mul & recip) General layout as IPs... Memory mgmt: Virtual memory paging (pages are local to node) Some explicit control over page handling I/O is memory-mapped Network: Topology... probably single-stage (recirc.) net, but not certain Switches... within nodes... explicit control of caching MIMD inter-node sync. as semaphores in network -hankd@ecn.purdue.edu