Path: utzoo!attcan!uunet!lll-winken!ames!amdahl!amdcad!rpw3
From: rpw3@amdcad.AMD.COM (Rob Warnock)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: Interrupts & Polling [was: Re: Super Cheap IP router (< $1000)]
Message-ID: <25242@amdcad.AMD.COM>
Date: 16 Apr 89 07:01:27 GMT
References: <820@aber-cs.UUCP>
Reply-To: rpw3@amdcad.UUCP (Rob Warnock)
Distribution: eunet,world
Organization: [Consultant] San Mateo, CA
Lines: 122

[Excuse me if I edited previous comments to the bone, but it's getting deep...]

By the way, let me preface this by saying that I actually completely agree
with Dave Crocker [below] that polling is the way to go when you have a
few very active devices... *if* you don't need to do something else with
your "idle" time. Pure polling is often the best (and sometimes the only)
strategy in an embedded controller. My remarks in the past few postings
have been assuming that there is general-purpose timesharing going on
in the same CPU, and thus you are trying to balance those famous three --
latency, efficiency, and throughput -- *and* give useful time to the "user".

+--------------- In article <820@aber-cs.UUCP> (Piercarlo Grandi) writes:
| +--------------- In article <25223@amdcad.AMD.COM> (Rob Warnock) writes:
| | +--------------- dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) writes:
| | | Interrupts kill... Many, occasional sources of activity warrant
| | | interrupts. A few, active sources warrant polling.
| | Which is why the two-level interrupt service structure I wrote a
| | "tutorial" about in comp.arch (circa 3/20/89?) does exactly this,...
| | But once you get an interrupt, additional interrupts are queued...
| | every second-level interrupt routine checks for more work...
| | if there is any, requeues itself on the tail of the task queue.
| While I agree that the technique is useful, it requires implementing
| a lightweight process system in your kernel, which may be major surgery.
+---------------

The only real surgery, and I'll admit it's a mass of painful detail, is
to go through the kernel (especially the disk cache stuff) taking out all
the unneeded "spl7()" (a.k.a. "splhigh()") calls, and replacing them with
"splsched()". But in fact, for a quick hack, you can simply map *all* the
old calls to "splsched()", and then just put back the few you really need.
[You have to do this, because in any "standard" kernel, there are places
that hold "splhigh" for many *milliseconds*. Any needed mutual exclusion
can always be done in a few microseconds, if only by setting a lock.]

As far as "lightweight processes", I think you misunderstood me. There is
a "lightweight task queue", but the "processes" that are run are actually
all interrupt completion routines, all of which were already there. There
is no new "context"; you're still in "interrupt context".

+---------------
| In a sense, you are using any interrupt as though it were
| the clock interrupt to start polling.
+---------------

Precisely! In fact, one version of this actually just used the kernel's
callout queue (the one that "timeout()" uses), and queued interrupt
completions with a time-to-run of "zero ticks", then let the next tick
of the normal "softclock" handler (which runs at the *lowest* interrupt
priority) run the 2nd-level tasks *just as if* they were timeouts which
had expired. [This assumes you can get a fast enough "hardclock". You can
have "urgent" interrupts start the queue running directly, if you like,
but you lose a lot of the efficiency benefit.]

Any new interrupts that occur while one of those is running get queued
after all the "zero" callouts but before any that are really waiting for
time to elapse. (You should add one more pointer to be maintained, so
you don't have to scan the queue.) And of course, the "softclock" handler
always checks the queue again before dismissing, just to see if any new
interrupts have arrived. (*And* if the clock ticks, adjusts the time left
of the first non-zero task and if it goes to zero promotes the task into
the zero group. That way you your "real" timeouts stay accurate, too.
Also, the "hardclock" 1st-level handler should bump a count of "ticks
seen while softclock busy" that "softclock" can use to keep time straight.)

+---------------
| The simple version, used e.g. in many UNIX kernels, is to have any
| and every interrupt processing procedure always check at the end for
| further pending interrupts for ITSELF, and then go into a loop. Even a
| little busy waiting, if it is known that there will be a next interrupt
| shortly, may be worthwhile, e.g. when reading packets/bursts from
| line on which you are running a protocol.
+---------------

That's fine, and should be used when appropriate. It mixes well with
the two-level style... as long as you don't leave everybody else's
interrupts turned off during that loop. That's the fundamental problem
the two-level scheme is trying to solve: to decouple the fast-latency
needs of simple hardware (e.g., SIOs at 38400 baud) from the efficiency
concerns of not taking/dismissing a bunch of [heavyweight] interrupts
when you know there's still more work to do.

By keeping the 1st-level interrupts *very* lightweight (*don't* save any
context [except maybe a working reg or two], just grab the data, queue it,
then queue your 2nd-level handler), you can afford a *lot* of interrupts,
many more than you would think. And by leaving hardware interrupts *enabled*
during [almost] all 2nd-level processing, you don't lose data due to
latency problems.

And by doing as much 2nd-level work [normal C-language "interrupt handlers"]
as possible before dismissing -- that is, by letting 1st-level interrupts
continue to add to the 2nd-level work queue and not dismissing until the
work queue is empty -- you only do one full save/restore for the lot.

+---------------
| I do not like much the idea of having an interrupt routine at the end
| fire up polling in other drivers. (If I understood correctly what you
| are thinking about).
+---------------

Why not? Shouldn't they get the same benefits as "your" driver?  ;-}  ;-}

Or maybe you don't like the idea of every interrupt triggering a lot
of "polling". Well, that doesn't really happen. You only "poll" those
other handlers for which interrupts occurred (and thus got queued)
while you were in your "dally" loop... and that's why you don't shut
off system-wide interrupts in your dally loop, just the device you
are polling. And then when you decide nothing else is going to happen
[typically after about 1.5 to 3 times the expected next event], you
re-enable your interrupt, return to the common interrupt "scheduler"
[see above], and Lo & Behold!, if while somebody *else* is handling
a burst your device interrupts, you'll get control again before the
ultimate "dismiss" occurs.

This saves *lots* of context save/restore CPU time. Lots.


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403