Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!linus!decvax!harpo!floyd!cmcl2!rna!n44a!wjh12!foxvax1!brunix!rayssd!raybed2!mjg
From: mjg@raybed2.UUCP
Newsgroups: net.unix-wizards,net.bugs.4bsd
Subject: The correct clock fix in 4bsd
Message-ID: <219@raybed2.UUCP>
Date: Wed, 30-Nov-83 09:05:06 EST
Article-I.D.: raybed2.219
Posted: Wed Nov 30 09:05:06 1983
Date-Received: Sun, 4-Dec-83 03:14:33 EST
Lines: 76


4.1bsd	CLOCK FIX	CLOCK FIX	CLOCK FIX	CLOCK FIX  4.1bsd

     The following commentary and C code lines have been extracted from
our sys/clock.c file.  Please note the one line added which was missing
from the original code.

     We like a lot of others noticed our clock was loosing time and we
had our operators reset the date every day.  This bothered me a lot so I
started looking into it.  The first thing we tried was calling dec who
suggested we try the system with vms.  We did over a weekend and clock
was fine so it had to be unix (4.1bsd).  I then put in debugging code
into the hard clock routine to find if we were missing clock interrupts
and we were.  The next test was to make a table 128 entries long with a
pointer at the top.  Each time we entered the hardclock routine and a
interrupt was missed (i also gave lbolt an extra tick) I incremented the
pointer and then stuffed the PC (from the stack) into the table.  A user
program was then written to continuously read the table from /dev/kmem.
This resulted in no answer to the clock but did tell us where the kernel
spends most of its time ( in open, read, and write ).  A few months later
i attacked the kernel again when i figured out it only happened when the
system was very busy.  We would loose one clock tic every 5 seconds when
the system was busy (load ave. > 30) and one tic every 60 seconds if
load average was between 10 and 30.  I then did a very close examination
of all the kernel code in C and ASM looking for someone who raised the IPL
and someway bypassed the code lowering it.  After 2-3 days i found it.
When in the soft clock routine and calling all the callouts the priority
did not get lowered after the last call.  This causes the rest of the
softclock routine to run at hardclock priority which blocks further hard-
clock interrupts.  The softclock routine always calls vmmeter routine which
does not take too long unless (time % 5 == 0) then it calls vmtotal which
when added to vmmeter and softclock take a very long time to run.  When
softclock is finally done we have missed a clock interrupt and the next one
has already arrived.  After softclock and hardclock finish we return to
who was running before with IPL taken from the original stack which in
most cases will return IPL to zero.

		Martin Grossman        allegra!rayssd!raybed2!mjg
		617-274-7100
		ext 3395 or 4793

===========================================================================
/*
 * Software clock interrupt.
 * This routine runs at lower priority than device interrupts.
 */
/*ARGSUSED*/
softclock(pc, ps)
	caddr_t pc;
{
	register struct callout *p1;
	register struct proc *pp;
	register int a, s;
	caddr_t arg;
	int (*func)();

	/*
	 * Perform callouts (but not after panic's!)
	 */
	if (panicstr == 0) {
		for (;;) {
			s = spl7();
			if((p1 = calltodo.c_next) == 0 || p1->c_time > 0){
/* this line is missing */	(void) splx(s);
				break;
			}
			calltodo.c_next = p1->c_next;
			arg = p1->c_arg;
			func = p1->c_func;
			p1->c_next = callfree;
			callfree = p1;
			(void) splx(s);
			(*func)(arg);
		}
	}
===========================================================================