Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!decvax!harpo!floyd!cmcl2!rna!n44a!wjh12!foxvax1!brunix!rayssd!raybed2!mjg From: mjg@raybed2.UUCP Newsgroups: net.unix-wizards,net.bugs.4bsd Subject: The correct clock fix in 4bsd Message-ID: <219@raybed2.UUCP> Date: Wed, 30-Nov-83 09:05:06 EST Article-I.D.: raybed2.219 Posted: Wed Nov 30 09:05:06 1983 Date-Received: Sun, 4-Dec-83 03:14:33 EST Lines: 76 4.1bsd CLOCK FIX CLOCK FIX CLOCK FIX CLOCK FIX 4.1bsd The following commentary and C code lines have been extracted from our sys/clock.c file. Please note the one line added which was missing from the original code. We like a lot of others noticed our clock was loosing time and we had our operators reset the date every day. This bothered me a lot so I started looking into it. The first thing we tried was calling dec who suggested we try the system with vms. We did over a weekend and clock was fine so it had to be unix (4.1bsd). I then put in debugging code into the hard clock routine to find if we were missing clock interrupts and we were. The next test was to make a table 128 entries long with a pointer at the top. Each time we entered the hardclock routine and a interrupt was missed (i also gave lbolt an extra tick) I incremented the pointer and then stuffed the PC (from the stack) into the table. A user program was then written to continuously read the table from /dev/kmem. This resulted in no answer to the clock but did tell us where the kernel spends most of its time ( in open, read, and write ). A few months later i attacked the kernel again when i figured out it only happened when the system was very busy. We would loose one clock tic every 5 seconds when the system was busy (load ave. > 30) and one tic every 60 seconds if load average was between 10 and 30. I then did a very close examination of all the kernel code in C and ASM looking for someone who raised the IPL and someway bypassed the code lowering it. After 2-3 days i found it. When in the soft clock routine and calling all the callouts the priority did not get lowered after the last call. This causes the rest of the softclock routine to run at hardclock priority which blocks further hard- clock interrupts. The softclock routine always calls vmmeter routine which does not take too long unless (time % 5 == 0) then it calls vmtotal which when added to vmmeter and softclock take a very long time to run. When softclock is finally done we have missed a clock interrupt and the next one has already arrived. After softclock and hardclock finish we return to who was running before with IPL taken from the original stack which in most cases will return IPL to zero. Martin Grossman allegra!rayssd!raybed2!mjg 617-274-7100 ext 3395 or 4793 =========================================================================== /* * Software clock interrupt. * This routine runs at lower priority than device interrupts. */ /*ARGSUSED*/ softclock(pc, ps) caddr_t pc; { register struct callout *p1; register struct proc *pp; register int a, s; caddr_t arg; int (*func)(); /* * Perform callouts (but not after panic's!) */ if (panicstr == 0) { for (;;) { s = spl7(); if((p1 = calltodo.c_next) == 0 || p1->c_time > 0){ /* this line is missing */ (void) splx(s); break; } calltodo.c_next = p1->c_next; arg = p1->c_arg; func = p1->c_func; p1->c_next = callfree; callfree = p1; (void) splx(s); (*func)(arg); } } ===========================================================================