Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!rochester!pt.cs.cmu.edu!andrew.cmu.edu!zs01+ From: zs01+@andrew.cmu.edu (Zalman Stern) Newsgroups: comp.arch Subject: Re: Coroutine switching (Was: Register usage) Message-ID: Date: 20 Jun 89 19:11:41 GMT References: <259@mindlink.uucp> <25382@ames.arc.nasa.gov> <1RcY6x#64Zq3Y=news@anise.acc.com> <26204@ames.arc.nasa.gov>, <20810@orac.mips.COM> Organization: Information Technology Center, Carnegie Mellon, Pittsburgh, PA Lines: 220 In-Reply-To: <20810@orac.mips.COM> (I'm a bit behind in comp.arch. This posting is in regard to doing coroutines or lightweight processes on register window'ed machines.) As the person who wrote the MIPS R2000 and Sun SPARC assembly code for our lightweight process (LWP) package, I can say a little bit about this. The R2000 port took about 2 hours, the SPARC port took 2 days. If nothing else, register windows are harder to understand. The main problem with the SPARC was understanding how things work, like the fact that the frame pointer and the stack pointer are in the same place for adjacent windows. That is, sp in the current window before a save instruction is fp in the current window after the save. Also, one must understand what the kernel does to flush windows. Basically, if the kernel ever has to flush the window, it stores that window at the address in that windows fp register. It is very important that fp always be correct, otherwise a UNIX context switch could store the window in to a random place. This was the hardest part of debugging the code, at random times a register window would be stored into the middle of another process' stack. Also, the kernel keeps track of which windows a user context and which are kernel context. This is so that the kernel can flush all user context only when necessary. SunOS provides a trap to flush all the windows to the stack. Combined with the fact that the kernel never restores more than one window at a time, this is all you need to write a coroutine package. If the kernel "prefetched" register windows, you would have a problem of flushing the windows, then on the next instruction UNIX context switching, then coming back with more than one window loaded. This is a problem because that window will get flushed later which will write over some other LWP's stack... I also looked at doing this for the AMD 29000. At least there, everything is in user space and I wouldn't have had to worry about what the kernel was doing. I think that would have been reasonably easy, although one still wants a trap to save all the registers. That way, the register saving code can occur once in the entire system and it might even be in the cache. This might be important when you need code to potentially save 128 registers. Also, I don't think comments on context switching apply in these situations. If register saving is only 5% of your context switch for an LWP package, then you have done something dreadfully wrong. The two register window implementations I know of (SPARC and AMD 29k) both hedge their bets on this one by letting you bust up the register file into pieces and dedicating each piece to a single LWP. Then all you have to do is write code to handle cacheing of contexts in the register file and flushing/restoring them when necessary. This wouldn't be too hard and would probably have great performance. However, in the case of the SPARC, SunOS 4.0 and the Sun compilers make this impossible. Anyway, here's the code. The routines used by this package are savecontext and returnto. Savecontext takes a function to call, an area in which to save state, and possibly a stack to switch to. Returnto takes an area and returns to the context that last called savecontext with that area. Generally, an area is just a pointer to a long that holds a stack pointer. In the case of the SPARC, I save the global registers into the area as well since I don't know what the calling convention says about saving/restoring the globals. The MIPS code assumes the assembler is going to do reordering. savecontext(f, area1, newsp) # int (*f)(); struct savearea *area1; char *newsp; # returnto(area2) # struct savearea *area2; MIPS: /* Code for MIPS R2000/R3000 architecture * Written by Zalman Stern April 30th, 1989. */ #include /* Allow use of symbolic names for registers. */ #define regspace 9 * 4 + 4 + 6 * 8 #define floats 0 #define registers floats + 6 * 8 #define returnaddr regspace - 4 #define topstack 0 .globl savecontext /* MIPS' C compiler doesn't prepend underscores. */ .ent savecontext /* Insert debugger information. */ savecontext: li t0, 1 .extern PRE_Block sb t0, PRE_Block subu sp, regspace .frame sp, regspace, ra /* Save registers. */ sw s0, registers + 0(sp) sw s1, registers + 4(sp) sw s2, registers + 8(sp) sw s3, registers + 12(sp) sw s4, registers + 16(sp) sw s5, registers + 20(sp) sw s6, registers + 24(sp) sw s7, registers + 28(sp) sw s8, registers + 32(sp) /* Save return address */ sw ra, returnaddr(sp) .mask 0xc0ff0000, -4 /* Need to save floating point registers? */ s.d $f20, floats + 0(sp) s.d $f22, floats + 8(sp) s.d $f24, floats + 16(sp) s.d $f26, floats + 24(sp) s.d $f28, floats + 32(sp) s.d $f30, floats + 40(sp) .fmask 0x55400000, regspace sw sp, topstack(a1) beq a2, $0, samestack addu sp, $0, a2 samestack: jal a0 .end savecontext .globl returnto .ent returnto returnto: lw sp, topstack(a0) lw s0, registers + 0(sp) lw s1, registers + 4(sp) lw s2, registers + 8(sp) lw s3, registers + 12(sp) lw s4, registers + 16(sp) lw s5, registers + 20(sp) lw s6, registers + 24(sp) lw s7, registers + 28(sp) lw s8, registers + 32(sp) /* Save return address */ lw ra, returnaddr(sp) /* Need to save floating point registers? */ l.d $f20, floats + 0(sp) l.d $f22, floats + 8(sp) l.d $f24, floats + 16(sp) l.d $f26, floats + 24(sp) l.d $f28, floats + 32(sp) l.d $f30, floats + 40(sp) addu sp, regspace sb $0, PRE_Block j ra .end returnto SPARC: #include #include .data .globl _PRE_Block topstack = 0 globals = 4 ! savecontext(f, area1, newsp) ! int (*f)(); struct savearea *area1; char *newsp; .text .globl _savecontext _savecontext: save %sp, -SA(MINFRAME), %sp ! Get new window ta ST_FLUSH_WINDOWS ! Flush all other active windows /* The following 3 lines do the equivalent of: _PRE_Block = 1 */ set _PRE_Block, %l0 mov 1,%l1 stb %l1, [%l0] st %fp,[%i1+topstack] ! area1->topstack = fp st %g1, [%i1 + globals + 0] ! Save all globals just in case st %g2, [%i1 + globals + 4] st %g3, [%i1 + globals + 8] st %g4, [%i1 + globals + 16] st %g5, [%i1 + globals + 20] st %g6, [%i1 + globals + 24] st %g7, [%i1 + globals + 28] mov %y, %g1 ! Save this in the unlikely event that its required st %g1, [%i1 + globals + 32] cmp %i2, 0 be,a L1 ! if (newsp == 0) no stack switch nop add %i2, STACK_ALIGN - 1, %i2 ! SPARC requires stricter alignment than and %i2, ~(STACK_ALIGN - 1), %i2 ! malloc gives so I force alignment. sub %i2, SA(MINFRAME), %fp call %i0 restore L1: call %i0 ! call f() nop ! returnto(area1) ! struct savearea *area1; .globl _returnto _returnto: ta ST_FLUSH_WINDOWS ! Flush all other active windows ld [%o0+topstack],%g1 ! sp = area1->topstack sub %g1, SA(MINFRAME), %fp ! Adjust sp to the right place sub %fp, SA(MINFRAME), %sp ld [%o0 + globals + 32], %g1 ! Restore global regs back mov %g1, %y ld [%o0 + globals + 0], %g1 ld [%o0 + globals + 4], %g2 ld [%o0 + globals + 8], %g3 ld [%o0 + globals + 16], %g4 ld [%o0 + globals + 20], %g5 ld [%o0 + globals + 24], %g6 ld [%o0 + globals + 28], %g7 restore /* The following 3 lines do the equivalent of: _PRE_Block = 1 */ set _PRE_Block, %l0 mov 0,%l1 stb %l1, [%l0] restore retl nop Sincerely, Zalman Stern Internet: zs01+@andrew.cmu.edu Usenet: I'm soooo confused... Information Technology Center, Carnegie Mellon, Pittsburgh, PA 15213-3890