Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!rphroy!caen!hellgate.utah.edu!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.unix.internals Subject: Re: Implementing a multitasking OS on top of UNIX Message-ID: <12991@dog.ee.lbl.gov> Date: 9 May 91 13:24:18 GMT References: <1991May9.015124.20638@casbah.acns.nwu.edu> <1991May9.040020.26194@ux1.cso.uiuc.edu> <1991May9.055804.6550@casbah.acns.nwu.edu> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 110 X-Local-Date: Thu, 9 May 91 06:24:18 PDT In article <1991May9.055804.6550@casbah.acns.nwu.edu> craig@casbah.acns.nwu.edu (Craig Robinson) writes: >Just for my own edification though, what does happen at the CPU level when >a process makes a system call? It depends on the CPU. The VAX has four `chm' instructions (chm[uesk]). Unix uses only chmk, `change mode to kernel'. This is done as, e.g.: pushl $1 # argument to syscall chmk $1 # exit(1) The chmk changes to kernel mode, sets the stack pointer to the Kernel Stack Pointer `ksp' (it was the User Stack Pointer `usp'), and jumps to the `chmk' vector. (Actually it reads the chmk vector out of the scb, and uses the low 2 bits to decide whether to use the kernel stack or the interrupt stack, or to halt.) The parameter to chmk is pushed on the new stack after the previous psl and pc. That is: *--ksp = psl; *--ksp = pc; *--ksp = ; The BSD vax kernel follows this with pushing the T_SYSCALL type (not actually used, it just makes the trap() and syscall() frames the same) and the usp, then calls syscall(). To return from a chm? instruction you pop the chm argument (`tstl (sp)+' or `addl2 $4,sp') and execute an `rei' (the semantics of rei are horribly complicated; see a VAX architecture manual). The BSD kernel takes advantage of the register save mask to get the user's registers saved at entry to syscall() itself. (They are thus on the stack and can be modified and will be reloaded on return automatically.) The Tahoe has a `kcall' instruction. It works a lot like the VAX chmk. The 680x0 has 16 `trap' instructions. (Well, actually one, with a parameter in the range 0..15.) The OS author decides to use one for system calls, and chooses how to encode the calls. SunOS, HPUX, and Utah's HP-BSD all use trap 0 and put the system call number in d0 (with the rest of the parameters on the stack). The trap switches to kernel mode (thus getting the kernel stack pointer), pushes the pc (4 bytes) and the sr (2 bytes), and jumps through the trap vector. The BSD trap-0 vector clears another 2 bytes to realign the stack (important on the 680[234]0 for performance), then pushes all the user's registers with `moveml #0xffff,sp@-'. The BSD kernel then saves the user SP (the moveml pushed the kernel stack pointer) and pushes the system call number again (!) and calls syscall(). It then pops the system call number, reloads the user stack pointer, does a `moveml sp@+,#0x7fff' to recover everything except the user stack pointer, adds 6 to sp to pop the usp and the alignment word, and then jumps to a routine that fakes a VAX `rei' (checking for pseudo ASTs: rather silly but a bit difficult to clean up). The SPARC has the `t' (trap) instruction, which has a 7-bit parameter for (in effect) 128 trap instructions. The OS author decides to use one for system calls, and chooses how to encode the calls. SunOS and BSD both use software trap 0 and put the system call number in %g1, with the rest of the parameters in the trapping routine's %o0..%o5. (For the indir system call, the 7th parameter is found on the routine's stack. The usual case involves no memory traffic, however.) All SPARC traps, including interrupts, work the same way: They decrement the current window pointer in the psr, write the pc and npc into what are now %l1 and %l2, copy the psr `S' (supervisor) bit into the `PS' (previous supervisor) bit, clear the `ET' (enable traps) bit, and set pc and npc---npc is the `next' pc, exposed due to delayed branches ---to the trap base register plus 16 times the trap vector index. (Software traps start at 128 and go to 255. Hardware traps use the range 0..127.) That is all that the hardware does: it does not set up a kernel stack pointer, or save things to a stack. The rest is up to software. My kernel: - branches to syscall and saves %psr in %l0 in the delay slot; - at syscall, invokes a hairy macro called TRAP_SETUP: if (trap came from kernel mode) { // i.e., psr is set if (we are in the trap window) save the trap window somewhere; %sp = %fp - stackspace; // here stackspace=80 } else { compute the number of user windows; if (we are in the trap window) save the trap window somewhere; %sp = (top of kernel stack) - stackspace; } where the number of user windows is: cpcb->pcb_uw = (cpcb->pcb_wim - 1 - CWP) % nwindows which is computed via table lookup (the pcb_wim field is maintained by software; it is simply log2(%wim)); - enables traps, stores the saved psr (%l0), pc (%l1), npc (%l2), %y (read into %l3), the values of %g1 through %g7 and the caller's %o0 through %o7 (now our %i0..%i7) into the 80 bytes reserved above; - calls syscall(), passing the address of the stuff just built on the kernel stack. Note that it is possible, but wrong, to get a kernel mode system call. Therefore, part of the work in TRAP_SETUP could be dispensed with, but for the fact that the trap window must be saved anyway, even if we are just going to panic. Since the delay slot for that test is filled for both cases, this is only a single instruction; the loss is minor. The `save the trap window somewhere' is complicated but is done as a subroutine (with linkage being stored in %l4, leaving only 3 registers free in the save code in some cases, but that turns out to be *just* enough). -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov