Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!mentor.cc.purdue.edu!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Re: Context switching on RISC chips Summary: We will switch no context before its time. Keywords: Context Switching, Interrupts, MIMD Architecture Message-ID: <14007@pur-ee.UUCP> Date: 1 Jan 90 17:21:31 GMT References: <3167@iitmax.IIT.EDU> Reply-To: hankd@pur-ee.UUCP (Hank Dietz) Organization: Purdue University Engineering Computer Network Lines: 63 In article <3167@iitmax.IIT.EDU> ed@iitmax.iit.edu (Ed Federmeyer) writes: >One of the things that seems to characterize RISC chips is the relatively >large number of registers. This makes me wonder what happens during a >context switch. After all, moving 256 (or more) registers to memory, and >then another 256 (or more!) back in for each context switch seems like an >awfull lot of overhead. I can think of a few ways around this: > >1) Do nothing special... Suffer >2) Have each register "tagged" like a cache, so only the "dirty" registers > need to be moved out. You'd still have to load in all the old ones. >3) Have a few register "sets". Ie, a context switch really moves a pointer > to a bank of registers (of which there are several on-chip). >4) Like 3, but only have 2 sets. While context 2 is processing, drain out ... First of all, it isn't just the register file which has gotten big -- it's the complete localized process state. This includes registers, caches, even process-specific page tables and disk buffers. Second, it has NOTHING TO DO WITH BEING RISC -- chips are fast, talking with other chips is slow, talking with other boards is even slower, so ANY high-performance architecture naturally tends toward a larger, longer lived, localized process state. As for your list of choices, you left out two favorites: 5) Initiate context switch before it is needed... this looks a lot like the incremental checkpointing done by fault-tolerence folk. 6) Since nobody builds uniprocessors anymore (;-), NEVER interrupt a processor: simply let another processor which happens to be free at that moment (or the next processor to become free) handle the interrupt. Number 6 is the really interesting one. Suppose you have an MIMD architecture with perhaps 64 processors and typically only about 5-10 programs running simultaneously... you should use execution-time changes in the parallelism width of each program to implement priorities. The OS scheduler (which is partly hardware) would simply insure that there is always at least one processor free to service any time-critical interrupt which might arrive (e.g., "incoming ICBMs detected"). Less critical interrupts (e.g., "Joe Luser types the letter Q") can simply be buffered awaiting a free processor... one will become free as soon as a program either terminates or reaches a point at which it can change parallelism-width. With "enough" processors, this stochastic wait for a free processor can be made arbitrarily short.... BTW, you might say that processes can require context switches for synchronous events (e.g., loading a value from memory which is far away), but IMHO the use of a context switch is usually overkill in such cases (sorry, Burton ;-). This is because, with the right architecture, synchronous delay events can be hidden using static (compile-time) scheduling (e.g., code motions to hide delayed loads). Beside that, it doesn't bother me very much if a few of many processors are needlessly idle. Why? Because adding context switching ability adds a surprisingly large degree of circuit complexity (as you have implied above), my wild guesstimate is that it would be at least 30%. Suppose I have the choice between having 100 context-switchable processors or 130 otherwise-equivalent non-interruptable processors... about 30 non-interruptable processors would have to be idle before I'd get worried. Oh yes... the non-interruptable processors will probably also have fewer gate delays per basic operation -- they'll run with a faster clock. -hankd@ecn.purdue.edu