Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!amdcad!ames!lll-lcc!lll-tis!helios.ee.lbl.gov!pasteur!agate!saturn!edler@cmcl2.nyu.edu
From: edler@cmcl2.nyu.edu (Jan Edler)
Newsgroups: comp.os.research
Subject: Re: Question on thread/process scheduler interaction
Message-ID: <5460@saturn.ucsc.edu>
Date: 14 Nov 88 19:26:30 GMT
Sender: usenet@saturn.ucsc.edu
Organization: New York University, Ultracomputer project
Lines: 77
Approved: comp-os-research@jupiter.ucsc.edu


This topic is a few weeks old, but we had a some kind of news problem here
delayed my posting this.

In article <5296@saturn.ucsc.edu> fouts@lemming. (Marty Fouts) writes:
>N tasks (threads) are started on top of M processes on a
>multiprocessor with P processors (N >= M >= P > 1).
...
>One task enters the critical region in one process.  While there, a
>second task running on a different process spinlocks on the semaphore.
...
>The process level scheduler preempts the process running the task in
>the critical region leaving the second process at the spinlock.  The
>worst case is when the process is preempted by a third process
>belonging to the same job which also reaches the spinlock.

Our solution to this problem is a mechanism called "temporary nonpreemption".
A process informs the kernel of temporary conditions that make preemption
unadvisable.  The scheduler in the kernel honors this advice, up to a limit
to prevent abuse.  This solves the problem as long as the maximum time in
the critical secion is less than the limit the scheduler is willing to honor.

The implementation is very efficient.
Two special communication variables are used.  They are ordinary integers,
residing in the user's address space, but their location is known to the
scheduler.  Both are initialized to 0.  The first is set to a non-zero
value by the user to request temporary non-preemption, and the second is set
to a non-zero value by the scheduler to warn the user that preemption will
soon be required.  We assume the scheduler runs like an interrupt handler.

Here is pseudo-code:

	tempnopreempt()	/* request temporary nonpreemption */
	{
		var1++;
	}

	tempokpreempt()	/* relinquish temporary non-preemption */
	{
		if (--var1 == 0 && (temp=var2) != 0) {
			var2 = 0;
			yield();
			return temp;
		}
		return 0;
	}

Yield() is a new system call that reschedules the processor.
Because var1 is incremented and decremented, these calls nest.
When the scheduler wants to preempt the currently running process,
it executes code like this:

	if (var1 == 0)
		ok to preempt;
	else if (preemption not already pending for this process) {
		var2 = 1;	/* notify user */
		note preemption pending;
	}
	else if (preemption pending for maximum allowable time) {
		var2 = 2;	/* time is up! */
		ok to force preemption;
	}

The pending preemption status is cleared on actual preemptions and on
yields().  The purpose of the tempokpreempt() return value is to notify
the user (after the fact) if preemption was requested or forced.

Overhead is only a couple instructions in the common case where no preemption
would have occurred anyway, and only the overhead of yield() otherwise.
The most abusive a user can get is to lengthen a time slice by a small amount,
and this can be compensated for by shortening the next time slice.

Jan Edler
NYU Ultracomputer Project
edler@nyu.edu
...!cmcl2!edler
(212) 998-3353