Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker!think!yale!cmcl2!edith!jan!edler From: edler@jan.ultra.nyu.edu (Jan Edler) Newsgroups: comp.arch Subject: Re: Inverted Page Tables Message-ID: <2085@edith.ultra.nyu.edu> Date: 2 Mar 90 22:16:32 GMT References: <37877@cornell.UUCP> Sender: news@edith.ultra.nyu.edu Distribution: comp Organization: New York University, Ultracomputer project Lines: 68 In article <37877@cornell.UUCP> huff@cs.cornell.edu (Richard Huff) writes: > What is currently being done, or being proposed, for such large NUMA > MIMD machines? How does the Butterfly II, Ultracomputer, or RP3 do > virtual to physical address translation? Do they employ a separate > virtual address space per process? Is it 32-bits, or larger? > > Is anyone out there considering building a NUMA MIMD shared memory > machine with a single, machine wide, 64 bit virtual address space? There are a couple different issues here. First is one of terminology and taxonomy: The acronyms UMA and NUMA are somewhat slippery, at least for some machines (like the NYU Ultracomputer). In the "generic" Ultracomputer design, all memory is equally far from all processors, thus making it an UMA. Of course we add caches (possibly with only software-controlled cacheability to enforce coherency), making the UMA designation less appropriate. If we add local memory, some would call the design NUMA. Yet we don't consider such modifications to be significant enough to warrant a reclassification of the machine, i.e. we still think of the machine in much the same way, with or without local memory. Consider the RP3, where each processor enjoys fast access to a co-located memory module (there is no memory equally far from all processors): Is it UMA or NUMA? Virtual address ranges can be interleaved accross the memory modules or sequential within a memory module as controlled by the MMU. Sequential placement is good for private (or mostly-private) memory; interleaved is good for shared. There are a spectrum of possibilities, with two extremes: - all "sequential": the machine appears to be NUMA - all "interleaved": looks UMA ( (n-1)/n of the refs are uniform nonlocal) It depends on how you plan to use (or think about) the machine. As for how address translation is performed on the Ultracomputer, there is really no single good answer. The "generic" Ultracomputer design doesn't really address the issue, merely assuming that some sort of address translation hardware is present to support a general-purpose operating system. The issues of word and address size are also not very relevant to the generic design, except that the word size determines the largest object that can be atomically accessed with a single load or store instruction. When considering an implementation of the Ultracomputer, things become quite different, and of course it really matters how things are done. All the specific designs we've considered at NYU have had 32-bit word sizes and 32-bit addresses. The operating system design supports a separate address space for each process (although support for lightweight processes within a shared address space is under consideration). To date we've only considered hardware designs with fairly conventional MMUs: - buddy-system, TLB-only (with mc68451 MMU), - multi-level segment/page tables (mc68030), - TLB-only fixed-size pages (am29000). In all cases, we've considered page tables (or their equivalent) to reside in shared memory (this was also the case with our OS design study for RP3, which was to support sequential memory as well as interleaved). Other factors to consider are cache control and TLB coherence. Up to now, our designs have relied entirely on software for cache coherency, and so we assume the MMU can indicate the cacheability of each reference. Hardware cache or TLB coherence schemes can impact the MMU in various ways, some of them favoring a globally shared address space (but not necessarily with a flat addressing scheme). Jan Edler NYU Ultracomputer Project edler@nyu.edu (212) 998-3353