Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!amdcad!rpw3
From: rpw3@amdcad.AMD.COM (Rob Warnock)
Newsgroups: comp.arch
Subject: Re: Intel/MIPS Dhrystone ratio
Message-ID: <24929@amdcad.AMD.COM>
Date: 21 Mar 89 05:18:34 GMT
References: <1552@vicom.COM> <28200290@mcdurb>
Reply-To: rpw3@amdcad.UUCP (Rob Warnock)
Organization: [Consultant] San Mateo, CA
Lines: 128

In article <28200290@mcdurb> aglew@mcdurb.Urbana.Gould.COM writes:
+---------------
| Bravo! Who needs vectored interrupts? 
| How often does your device know better where to interrupt to than you do?
+---------------

When I first began designing with the Am29000, at first all my old habits
felt cramped at "only" 4 levels of external interrupt, which don't even
read a vector from the interrupting device. But I quickly realized that since
the 29k has a "count-leading-zeroes" (CLZ) instruction, all you need is a
magic external location you can read (can you spell 74F374?) which gives you
one bit per interrupting device, and an inclusive-OR to your single interrupt
line. (Who needs 4 of them, anyway?) So you load the bits, CLZ, add a table
base, and jump...

Given slow 8-bit I/O chips, that takes a lot less time than a vector fetch.

+---------------
| But...  how can interrupt (not exception) handling be made better/worse?
| As an erstwhile systems programmer in a real-time OS, I know that we often
| wished that interrupts could be treated exactly like processes,
| going through the same priority or deadline driven scheduler.
| Yet applying RISC principles to the hardware that would be needed to do
| something like this, I often arrive at the conclusion that a 
| simple single entry point first level handler is all that is appropriate.
| Everything else seems to need sequencing.
+---------------

I agree.

[Tutorial alert. Many of you know this already. But it's worth saying once
or twice a decade, and I haven't heard it lately, so here goes...]

As has been done by many of us on a variety of machines, a useful interrupt
software "style" (good on many CISCs as well as RISCs) seems to be to split
interrupt handlers into a "first-level"/hardware-oriented/assembly-language
section, and a "second-level"/software-oriented/C-language part, with the
following characteristics:

- You leave the "real" hardware interrupts always enabled (especially during
  2nd-level handlers, system calls, etc.).

- When an interrupt occurs, all you do is clear the interrupting hardware,
  grab whatever really volatile data there might be, and queue up the
  2nd-level handler to run -- if it's really needed ("soft"-DMA can often
  just stash the data in a buffer and dismiss). If there's already a 2nd-level
  handler running at the same or higher *2nd-level* priority [see below],
  you just queue up a task block, and IRET. The trick is that the *hardware*
  interrupt is disabled only for the brief moment when a 1st-level handler
  is running.
  
- The Unix "spl??()" [Set Priority Level] routines are modified to manipulate
  a *software* notion of priority, which is respected by the 2nd-level routines
  and system-call level code (but not the hardware), and never turn off the
  hardware enables.  Necessary exclusion with 1st-level handlers is done with
  *very* short interrupt disable periods, or none at all. (Treating the 1-st
  level handlers like "DMA devices", you can usually find a way to eliminate
  the IOFFs).

- The interface between 1st- & 2nd-level sections is a little "task queue",
  sort of a light-weight "real-time scheduler". You can have a one, or any
  number of interrupt task queues, not necessarily related at all to whatever
  hardware priorities you are stuck with.

- Once you start running a 2nd-level routine, you continue taking tasks off
  the 2nd-level queue(s) until they are empty, before restoring the CPU state
  and dismissing. (Since hardware*interrupts are still on, it is quite
  possible that more than one 2nd-level routine gets run per CPU state save.)

- If you *can* get by with just one 2nd-level priority, do so. It avoids
  the extra state saving that comes with preempting multi-level priorities.
  (I know, sometimes you can't avoid it. But sometimes you can. On one
  system we just used the Unix "callout" queue, just setting a zero delay
  time if the task was for an interrupt.)

The advantages of this style are these:

1. Since hardware interrupts are never turned off for long, input data
   overruns are easy to avoid. (...unlike some Unixes which turn off the
   world whenever they are searching the buffer cache!!! No wonder so many
   people think Unix can't do 19200 baud input. At the same time, you save
   a some hardware cost, since the need for real DMA hardware is lessened.)

2. The 1st-level tasks can usually be done in a few assembly instructions
   without saving very much CPU state; the 2nd-level tasks need a full
   C context, reentrant and "interruptable" -- a lot more state. Since
   interrupts are often "bursty", the two-level structure saves state
   *once* for several interrupts, a significant efficiency gain. In fact,
   interrupt handling gets more efficient the higher the interrupt rate.

3. Most interrupts from "character" devices can be handled entirely in
   the 1st-level handlers as "soft-DMA", or "pseudo-DMA", thus lessening
   further the number of full CPU state saves done.

4. Since hardware and software priorities now have nothing to do with
   each other, you can allocate priorities more rationally. For example,
   you may have a multi-line serial card which has one interrupt level
   for all the transmitters and receivers on the card; also in the system
   is a disk. In this case, the 1-st level serial-I/O handler will probably
   want to queue input (received) data to be processed at a *higher* 2nd-level
   priority than the disk, but queue output (transmit done) interrupts at a
   *lower* priority than the disk.

Applying the above to a Version 7 Unix port to a 5.5 MHz 68000 (years ago),
we were able to take a system which could hardly do a single 2400-baud UUCP
and get it to cheerfully handle three simultaneous 9600-baud UUCPs! ...and
with no change to the hardware: interrupt-per-character SIO chips.


[Note: When the 29000 takes an interrupt, volatile state (PC, PS) is "frozen"
in backup or shadow registers in the CPU, and execution continues (with some
slight restrictions). An "IRET" restores the running process's state from the
shadow registers. Instructions exist to read/write the shadow registers if a
full save/restore is to be done.

The very-light-weight "freeze mode" interrupt matches very nicely with the
above interrupt software style. You dedicate a few protected global registers
to freeze-mode processing, and *no* state has to be explicitly saved/restored
unless a 2nd-level handler needs to be started in a full "C" context.]


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403