Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ukma!rutgers!bellcore!ka9q.bellcore.com!karn From: karn@ka9q.bellcore.com (Phil Karn) Newsgroups: comp.protocols.tcp-ip Subject: Re: Super Cheap IP router (< $1000) Message-ID: <15321@bellcore.bellcore.com> Date: 14 Apr 89 05:35:32 GMT References: <8904140206.AA10084@ucbvax.Berkeley.EDU> Sender: news@bellcore.bellcore.com Reply-To: karn@ka9q.bellcore.com (Phil Karn) Organization: Secular Humanists for No-Code Lines: 69 My experiences match those of Dave Crocker almost exactly. Per-character interrupts on the IBM PC are deadly. Minimizing the number of host level interrupts per byte transferred is the single most important optimization you can make in almost any PC communications program. The problem is that existing PC hardware has virtually no support for anything else. Background polling is usually out of the question, as most applications are complex enough to make it highly inconvenient to poll often enough to avoid missing input. DMA is virtually unusable, given the limited number of channels on the original PC plus a desire to be backward compatible with that machine. This leaves interrupt-driven I/O and busy-wait loops. I recently did a driver for the PC that handles a HDLC controller connected to a 56Kbps amateur packet radio modem. (Yes, we've made some progress since the InterOp 88 and Ann Arbor IETF demos. :-)) At 142 microseconds between characters, there was no way I could make it run in interrupt driven mode, nor could I tolerate an interrupt from another source while the interface was active. I therefore designed the driver to use only one interrupt: demodulator carrier detect. The presence of carrier causes the host to enter a polling loop on the receive status register with interrupts disabled. It stays there, receiving frames, until the carrier goes away. The transmit routine is simpler: it just busy waits on the transmitter with interrupts off, sending frames as long as it has frames to send. The scheme works, but is much less than satisfactory. Whenever the channel is active, all other activity on the system freezes. Keystrokes are not echoed. Even the $#@!! time-of-day clock freezes (why computer designers have this fetish for complex interrupt-driven software clocks instead of simple read-only hardware binary counters driven by oscillators, I'll never understand). The irony of this situation is that it wouldn't be so bad if the modem were faster; the PC would spend less time sending each packet. There is enough real time, even on a 4.77 MHz PC, to spin around the wait loop on the device a few times for each character that is actually sent or received. But the inter-character time is not long enough to go off and do any other useful work, so it goes to waste. It's sort of like making a cross-country airline trip with several hour-long connections. They're long enough to become a significant fraction of the total trip, but each one is too short to do anything but sit around each terminal, waiting. Just having lots of FIFO buffering on each I/O card would be an enormous help. It would be really nice to use the 80286 INS (block input) instruction to slurp several kilobytes out of a FIFO that had been loaded by the line controller without direct processor intervention. Considering the speed of this instruction, the total bus overhead would actually be less than DMA since you can avoid the bus arbitration that has to go on for each DMA transfer. Better yet is enough FIFO buffering plus hardware smarts to handle several packets without host intervention. Except for the newer Ethernet controllers the slave I/O CPU seems to be the only way to do this. But this is not to say that the link or higher protocols should be executed on the controller -- its job should be strictly limited to buffering for the purpose of alleviating the host processor's real-time constraints. Right now, my "slave I/O CPU" is a dedicated PC/XT with an Ethernet interface on one side and the packet radio interface on the other. It sits in the corner, gatewaying packets between the local Ethernet and the radio channel (the real Ether). Most people need a cheaper solution, so a friend (Mike Chepponis, K3MC) is designing a slave I/O processor for the PC that contains a V40 CPU, several hundred K of RAM and one or more 8530 HDLC chips. As an additional aside, polling is the standard technique used in electronic telephone switches. Imagine an interrupt-driven switch when all the phones come off-hook simultaneously... Phil