Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!apple!voder!pyramid!prls!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: RISC v. CISC (was The NeXT problem) Message-ID: <7180@winchester.mips.COM> Date: 27 Oct 88 20:44:12 GMT References: <156@gloom.UUCP> <310@lynx.zyx.SE> <332@pvab.UUCP> <15964@agate.BERKELEY.EDU> <7681@boring.cwi.nl> Reply-To: mash@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 67 In article <7681@boring.cwi.nl> jack@cwi.nl (Jack Jansen) writes: >Well, 100 usec might be fine for standard unix, it is definitely not >fine for operating systems supporting light-weight threads. >In amoeba, our distributed system, thread-to-thread switch time >is in the order of 20-50usec, and on a fast machine like a R2000 >it would probably be down to 5-20usec, not counting the register >save. >What I would like is some help from the architecture, like dirty bits on >groups of registers or something. >Actually, I'm not *that* familiar with the R2000 (or the other risc >chips, for that matter); do any of them provide a feature for this? There are two styles of doing this, most typically associated with the floating-point register file. a) Keep a dirty bit. b) Keep a "useable" bit, where you trap if somebody issues an FP instruction. In case a), on a context switch from task 1 to 2: if 1's registers are dirty, save them load 2's state into the reigsters switch In case b), for the same context switch: maintain an "owner" for the FP regs, which is either a task (X),or empty note that 1 may well not own the FP regs at this point before switching to 2: if 2 is the owner of the FP regs, turn useability on if 2 is not the owner, turn useability off switch to 2 if 2 uses an FP op, trap it save the FP state into X's context load up 2's FP state into the registers owner = 2 there are variant strategies, depending on how fancy you want to get. MIPS has a useability bit for each coprocessor; we also actually keep bits in the executables that say which registers got used. [we put these in just in case, although more for special-purpose environments. They turn out not to be very useful: the optimizers are too good at grabbing every register.] SPARC uses a similar technique, I think. Clipper uses a dirty bit. Various other micros do one or the other. BTW: it is not instantly obvious that one would add a bit in for just this purpose. On a 16.7MHz M/120, it takes something like 4-30 microseconds to save 32 registers and restore 32 registers [the 4 is all cache hit, the 30 is all cache miss]. On a 25MHz M/2000, it takes 3-10 microseconds, even with a large (i.e., inherently longer-latency) memory system: note that block refill of the caches helps a lot in that case. I'd guess that "typical" numbers, especially in a high context switch environment would be on the order of 15 & 7 microseconds, respectively, in a general-purpose environment. [In a real-time environment, one would gimmick some of the things to avoid I-cache misses.] Thus, a useability bit might save this for you, some of the time. We actually put it in for several reasons: a) Symmetry: we actually use a useability bit on coprocessor 0, which subsumes what would otherwise be privileged ops. b) Simplicity of handling systems without an FPU. c) and, finally, the ability to avoid FP context switches as described. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086