Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucbvax!ucbarpa.Berkeley.EDU!melvin From: melvin@ucbarpa.Berkeley.EDU (Steve Melvin) Newsgroups: comp.arch Subject: Re: Context switching on RISC chips Summary: Scheduling Basic Blocks and Variability in Memory Latency Keywords: Context Switching, Interrupts, MIMD Architecture, HEP Message-ID: <33452@ucbvax.BERKELEY.EDU> Date: 3 Jan 90 10:34:11 GMT References: <3167@iitmax.IIT.EDU> <14007@pur-ee.UUCP> Sender: usenet@ucbvax.BERKELEY.EDU Reply-To: melvin@ucbarpa.Berkeley.EDU.UUCP (Steve Melvin) Organization: University of California, Berkeley Lines: 56 In article <14007@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes: >First of all, it isn't just the register file which has gotten big -- it's >the complete localized process state. This includes registers, caches, even >process-specific page tables and disk buffers. Second, it has NOTHING TO DO >WITH BEING RISC -- chips are fast, talking with other chips is slow, talking >with other boards is even slower, so ANY high-performance architecture >naturally tends toward a larger, longer lived, localized process state. > Your point that the localized process state includes more than just registers is well taken, but I'd like to take issue with your second statement. There are indeed architectural features which affect the size of the process state, independent from the speed differential between the processor and the rest of the world. The key is that temporary results which are computed and used within a basic block need not be stored in "named" registers. That is, they may be internal registers but NOT part of the process state if, for example, context switches are never initiated except on basic block boundaries (interrupts can either have long latencies or work can be discarded). Of course, how much the state can be reduced while preserving the advantages of the same number of internal registers depends on the application and the compiler. The point is that by increasing the size of the unit of work which the processor considers "atomic" (I call this an Execution Atomic Unit), fewer *architectural* registers are required. The main reason one would want to do this is to allow more parallelism to be exploited, but that's another issue. > >BTW, you might say that processes can require context switches for >synchronous events (e.g., loading a value from memory which is far away), >but IMHO the use of a context switch is usually overkill in such cases >(sorry, Burton ;-). This is because, with the right architecture, >synchronous delay events can be hidden using static (compile-time) >scheduling (e.g., code motions to hide delayed loads). > Well, first of all, the use of the term "context switch" is a little misleading as applied to the HEP-1. That machine allows the context for up to 50 processes to simultaneously be present in the processor, eight of which are at any time actively in the pipeline. When a process needs to wait for memory, it is simply not returned to the group (of 50) which are candidates for being scheduled. Secondly, I disagree that with the "right architecture" synchronous delay events can be hidden. If there is no variance in memory latency (i.e. if you have no cache or if you have a 100% hit rate), then the compiler can do a pretty good job (branch predictability is also important here, static vs. dynamic but that's another issue). However, when memory latency becomes more variable, dynamic scheduling starts becoming more important (also, this effect is increasingly important as the number of parallel operations increases). ------- Steve Melvin University of California, Berkeley -------