Xref: utzoo comp.unix.internals:1352 comp.unix.sysv386:2907 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.unix.internals,comp.unix.sysv386 Subject: The performance implications of the ISa bus Message-ID: Date: 10 Dec 90 18:24:30 GMT References: <1990Dec5.144445.18632@abcfd20.larc.nasa.gov> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 132 Nntp-Posting-Host: odin In-reply-to: jcburt@ipsun.larc.nasa.gov's message of 5 Dec 90 14:44:45 GMT X-Old-Subject: Re: Jargon file v2.1.5 28 NOV 1990 -- part 5 of 6 On 5 Dec 90 14:44:45 GMT, jcburt@ipsun.larc.nasa.gov (John Burton) said: In article <1990Dec5.144445.18632@abcfd20.larc.nasa.gov> jcburt@ipsun.larc.nasa.gov (John Burton) writes: jcburt> In article jcburt> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: pst> You're comparing CPU performance to I/O performance. [ ... ] Back pst> when there were REAL(tm) computers like 780, a lot of time and pst> energy went into designing efficient I/O from the CPU bus to the pst> electrons going to the disk or tty. [ ... ] Sure OS's and apps have pst> gotten bloated, but when you put a chip like the MIPS R3000 on a pst> machine barely more advanced than an IBM-AT you end up with a toy pst> that can think fast but can't do anything. pcg> No, no, no, no, no, no, no. The IO bandwidth of a typical 386 is pcg> equivalent or better than that of any UNIBUS based machine, and, in pcg> practical terms, equivalent to that of MASSBUS based ones. You can get pcg> observable raw disc data rates of 600-900KB/s and observable filesystem pcg> bandwidths of 300-500KB/s under SVR3.2 (with suitable controllers and a pcg> FFS of some sort). This is way better than a PDP-11. jcburt> True, a typical 386 machine has good I/O bandwidth, but jcburt> bandwidth isn't everything. The majority of 386 machines have an jcburt> ISA bus which is a very simple bus controlled by the cpu. When jcburt> performing I/O, the cpu blocks itself and turns control of the jcburt> bus to the I/O device. This not quite true. Actually it is not true at all. You seem to be describing synchronous programmed IO, which is not used in most ISA peripherals. Most ISA peripherals are interrupt driven, and even use DMA, and the CPU can work between interrupts. Definitely. jcburt> Machines that were originally designed as a multi-user platform jcburt> usually where set up so that the I/O could be performed without jcburt> the direct control (or blocking) of the cpu. The system bus was jcburt> designed so that multiple operations could occur more or less jcburt> independent of the cpu (multi-tasking hardware design). This is entirely true of the ISA bus and any PC system around. Hey, they even have DMA (well, read on). However, I can easily see that you misconceptions have a root in three problems with typical ISA machines, one that is particular to the design of a PC clone, and two that are particular to the most common disk controller design for such machines. For a very ugly reason, the DMA chips that perform DMA under the CPU control are nearly useless for high speed transfers, and on some designs the braindamage is bad enough that the few slow DMA channels avaialable cannot ven be shared. But there is no such restriction for DMA driven by a peripheral board itself, not by the CPU, and some (rare) boards have bus mastering ability and have their own DMA onboard. Since DMA using the CPU controlled DMA channels is so bad, the standard WD style AT controller does not use DMA. It is interrupt driven, so while the controller is seeking the disk or transferring data the CPU is free. When the controller is done seeking and transferring, the CPU gets an interrupt, and then copies byte by byte, with a very fast block move, the sector read from the controller's onboard cache to core. This is indeed done using programmed IO, synchronously and the CPU is busy while doing it, but it takes relatively little. Finally, the common type of ISA disk controller, for other relatively ugly reasons, is single threaded. This means that it cannot overlap seeks and transfers to/from multiple disks. It cannot overlap multiple tranfers because of the above mentioned sector buffer; there is only one sector buffer... In theory it could overlap seeks on two drives, or seeking on one with transfer on another, and indeed this can be done with seek buffering (ST506) devices using a clever (and obscene) hack. The really big problem for multiuser operation is the lack of overlap; the authors of the UNIX disk driver sort routine report that on with a multithreaded controller on a PDP-11, three moving arm disks operating in parallel givem under typical timesharing loads, the same performance as if they were a single fixed arm one with the sum of their capacities. This means that with a multithreaded disk controller, three disks, and typical timesharing load, the ability to move three arms in parallel is the same as having a single zero seek time arm. A big, big, big win. Two disks on a multithreaded disk controller are already a very large improvement over a single disk for timesharing, especially if you spread the (instantaneous) load across them by careful positioning of your partitions. Now back to the ISA bus. As somebody observes elsewhere, the IO bottlenecks of a timesharing system are the terminal lines and the disk controllers. If you use intelligent terminal controllers and intelligent multithreaded disk controllers you timesharing performance will be impressive, on a par with that of a VAX of the same class. Just using FIFO based serial line controllers substantially reduces terminal IO overhead; just using two ESDI controllers, one per each disk, will give tremendous improvements, because the two controllers will be able to seek and transfer in parallel. If you want higher performance use a microprocessor based intelligent serial line controller, and something like an AHA 154x disk controller, that is multithreaded, bus mastering, and has its own fast DMA channels. Ah, a final note: if you really want high performance form your multiuser ISA machine, DO NOT use in any way the console. Access to video RAM is so abysmally slow that it could consume a large portion of your bus bandwidth. If you want to do fast graphics on an ISA machine, buy an X terminal and a fast Ethernet board, don't use the console, unless you get a really expensive super intelligent video board with very fast truly 16 bit memory, but I think that for timeharing the X terminal solution is still better, and not much more expensive, because it allows further overlap in the generation fo the graphics and in its rendering on the screen. In summary: to saturate an ISA bus (5 MB/sec) you need a pretty large number of peripherals running continuously, such as more than three disks (say 800KB/sec each) and a network board (say 600KB/sec), which brings us to 2/3 of nominal. Things like a QIC tape (90KB/sec), 8 serial ports (20KB/sec for eight ports simultaneously at 19200 baud), and so on are irrelevant for bandwidth. You have then a problem with the typical high interrupts processing overheads of 386 UNIX systems, with their often badly written drivers, but if you use the right controllers even these are not that important. Let's say that a machine with 8 FIFO based serial lines, 2 < 20msec seek time discs attached to an AHA154x, a 386/25 noncaching motherboard (4 MIPS, let's say), and 16 MBytes can comfortably support 8 users doing fairly heavvy development work even using things like G++ and GNU Emacs. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk