Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!pasteur!ames!oliveb!sun!joe!petolino From: petolino%joe@Sun.COM (Joe Petolino) Newsgroups: comp.arch Subject: Re: RISC & context switches Message-ID: <89745@sun.uucp> Date: 14 Feb 89 23:22:07 GMT References: <784@atanasoff.cs.iastate.edu> <1989Feb12.002935.21396@utzoo.uucp> Sender: news@sun.uucp Reply-To: petolino@sun.UUCP (Joe Petolino) Organization: Sun Microsystems, Mountain View Lines: 53 >> I seem to recall there was (is?) a TI processor which had all of >> its registers in memory except 1 register which pointed to >> the other registers, so a context switch was just save/restore >> that one register. Could a similar concept be implemented >> with all the registers in the chip? >You can use the AMD 29000 that way, in fact, although doing register >windows is more popular in Unix environments. If you dedicate a set of >16 registers to each process, and dedicate most of the global registers >saving the rest of the state for the processes, you can have 8 processes >running with a context-switch time of something like 17 cycles. This same trick could be used with SPARC, too, for example if you were writing a real-time OS that needed fast, predictable context switch timing. The 'Current Window Pointer' (CWP) is a field of the PSR - writing a new value into the PSR gives you a whole new set of window registers, preserving the old register values. For those not familiar, here's a quick overview of the way SPARC registers work: there are eight global registers (one of them, g0, is hard-wired as a constant 0), plus a circular file of windowed registers. The size of this register file is implementation-dependent (it's 112 registers on the Sun4 chip). At any one time, the processor has access to a 'window' of 24 of these registers, starting at the one pointed to by the CWP field of the PSR (the CWP always points to a register whose number is a multiple of 16). The CWP can change (- or +, mod the size of the register file) in increments of 16 registers, in response to two instructions (save and restore) which are normally used in conjunction with the instructions that do procedure calls and returns. Thus, the 24-register window of a called routine overlaps its caller's 24-register window by eight registers. When a trap occurs, the CWP automatically moves up by sixteen registers. You can think of it as a poor-man's stack cache - the poverty part is that the stack pointer can only move in increments of 16, the CPU can only look at the top 24 words of the stack, and it has a finite size that must be managed by the OS. That management is facilitated by the 'Window Invalid Mask' (WIM), a special register with a bit for each possible value of the CWP. If a save or restore instruction would cause the CWP to decrement or increment to a value whose corresponding WIM bit is 1, then that instruction traps, and the OS must free up some registers (and update the WIM) before continuing. Note that, in an application where fast context switches between a small number of processes was the most important factor, you wouldn't even use the WIM. You'd write all the code without save and restore instructions (note that these operations are *not* part of the call/return instructions), and instead use normal loads and stores to save the state of the registers across procedure calls. The OS could then allocate 32 consecutive registers (i.e. two adjacent CWP values) for each process: one 24-register window to run in, and another 8 (in the next window above) for trap handlers to use. -Joe Petolino "I don't work for Marketing. Nobody told me to write this. As far as I know, it's all true!."