Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!samsung!uakari.primate.wisc.edu!ames!amdcad!nucleus!tim From: tim@nucleus.amd.com (Tim Olson) Newsgroups: comp.arch Subject: Re: Context switching on RISC chips Message-ID: <28573@amdcad.AMD.COM> Date: 1 Jan 90 20:30:16 GMT References: <3167@iitmax.IIT.EDU> Sender: news@amdcad.AMD.COM Reply-To: tim@amd.com (Tim Olson) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 62 Summary: Expires: Sender: Followup-To: In article <3167@iitmax.IIT.EDU> ed@iitmax.iit.edu (Ed Federmeyer) writes: | One of the things that seems to characterize RISC chips is the relatively | large number of registers. This makes me wonder what happens during a | context switch. After all, moving 256 (or more) registers to memory, and | then another 256 (or more!) back in for each context switch seems like an | awfull lot of overhead. I can think of a few ways around this: | | 1) Do nothing special... Suffer | | 2) Have each register "tagged" like a cache, so only the "dirty" registers | need to be moved out. You'd still have to load in all the old ones. | 3) Have a few register "sets". Ie, a context switch really moves a pointer | to a bank of registers (of which there are several on-chip). | 4) Like 3, but only have 2 sets. While context 2 is processing, drain out | context 1's set so it's ready by the next switch. Since a RISC chip | seems to execute 1 instruction in 1 cycle, I can't see that there is | alot of extra bus cycles. (Unlike in a CISC, where you might have hundreds | of clock cycles free while the processor executes an instruction in which | the bus is not being used) Unless of course you have a second bus | going to memory dedicated to just shuttling registers in and out. Here are 3 methods that can be used to reduce context switch time on the Am29000 RISC processor: 1) Register Banking (like your #3). Because the large local register file can be addressed offset from a stack-pointer value, it can be divided into separate register banks. A context switch then consists of saving and restoring a few special registers (to the on-chip global registers) and changing the stack pointer to use the new register bank. Register banks can be protected from user-mode access in groups of 16 registers. A full context switch takes about 1 microsecond at 25MHz. This method is supported in the Am29000 hardware, but current software tools do not make use of it. 2) Reduced Stack Cache Size. The standard Am29000 calling convention uses the local register file as a runtime stack cache. The maximum number of registers used in the cache can be regulated by a user-defined register spill/fill trap handler which is invoked whenever a portion of the cached stack must be saved/restored to memory. The number of registers used can range from ~32 to 128, allowing the system designer to make the correct trade-off between context-switch time and individual task performance. Using this technique, context-switch times can be varied from ~9 microseconds to ~17 microseconds with 2-cycle first-access, single-cycle burst memory at 25MHz. 3) Saving/Restoring Live Registers Only (like your #2). In systems where security (covert channels) is not an issue (as in most real-time embedded control systems), the context-switch time can be reduced by saving only the live local registers found between the current stack pointer and the top of the stack cache. This is typically about half of the total local registers. In addition, only the current procedure frame of the new context need be restored; the rest of the stack cache will be "faulted in" if necessary. This results in context switch times of ~10 microseconds using the full 128-register stack cache. -- Tim Olson Advanced Micro Devices (tim@amd.com)