Path: utzoo!utgpu!watserv1!watmath!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!julius.cs.uiuc.edu!apple!agate!darkstar!cs.washington.edu From: pardo@cs.washington.edu (David Keppel) Newsgroups: comp.os.research Subject: Re: How light weight can a process get? Message-ID: <9533@darkstar.ucsc.edu> Date: 29 Nov 90 05:10:17 GMT Sender: usenet@darkstar.ucsc.edu Organization: University of Washington, Computer Science, Seattle Lines: 63 Approved: comp-os-research@jupiter.ucsc.edu jms@central.cis.upenn.edu (Jonathan M. Smith) writes: >Notes on "lightweight" processes >[Using setjmp/longjmp] In case anybody cares, that's not a portable implementation as, e.g., a system may unwind the stack on a |longjmp|. >[7 microseconds per context switch on a RS/6000.] >[Context switch includes save/restore all machine registers.] I'm not familiar with the processor architecture. I assume 32 integer, 32 floating-point registers and a 30MHz (30ns) clock and no stalls, then in 7 microseconds I can execute 7000/30 = 233 instructions. Saving and restoring 64 registers should take at least 128 cycles, but the rest of the context switch could be pretty cheap - a few dozen instructions. So you're within a factor of two, but some shaving could be done. Another data point: GCC (the GNU C compiler) has the capability to say ``certain registers are clobbered by this |asm()|''. I have used this capability to implement a nonpreemptive multiprocessor lightweight process package on the Sequent i386-based multiprocessor that performs (multiprocessor locked) thread swaps on a 16MHz 6-register i386 (including the overhead of two procedure calls with a dynamically-bound target address) in slightly over 30 microseconds. Using the |asm()| feature lets me do *no* register saves and restores in the context switch routine, relying instead on the compiler to insert them callee-saves in the procedure in which the context switch is performed. The unfortunate thing is that the 1.x series GCC simply avoids using certain registers instead of treating the |asm()| as a register `kill', so the code generated around the context switch is inferior. That could be avoided with a superior register allocator (in 2.x maybe?) >[Or, you can use a register window pointer like the Sun-4s.] Unfortunately, massaging the window pointer on a conforming SPARC is a kernel-reserved operation, so even a minimalist implementation has to pay the kernel trap cost. The default primitives that are provided with e.g., SunOS are targeted for procedure calls not threads, and so you can't really do threads using windows under SunOS on the SPARC; further, saving the register file on overflow/underflow has been optimized but the general `flush' appears to be significantly slower. The APRIL project [Agarwal, Lim, Kranz, and Kubiatowicz; Proceedings of the 17th Annual International Symposium on Computer Architecture, IEEE Computer Society Press, pg. 104, 1990] will implement context switching using a slightly modified SPARC in an estimated 11 cycles for ``in-cache'' context swaps. Of that, 6 cycles is for the trap handler. Up to 4 threads may be in-cache at any given time, ``out of cache'' context swaps require saving and restoring 32 integer and 8 floating-point registers, so the context switch time will be like that of a conventional processor (no kernel access will be required, since only the current window must be saved and restored). How light weight can a process get? Pretty light. ;-D on ( Tastes great, less filling, swaps fast ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo