Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!bloom-beacon!oberon!cit-vax!mangler From: mangler@cit-vax.Caltech.Edu (Don Speck) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Keywords: readahead, striping, file mapping Message-ID: <6963@cit-vax.Caltech.Edu> Date: 16 Jun 88 06:32:08 GMT References: <22957@bu-cs.BU.EDU> <14968@brl-adm.ARPA> <601@modular.UUCP> <23288@bu-cs.BU.EDU> <7980@alice.UUCP> <23326@bu-cs.BU.EDU> Organization: California Institute of Technology Lines: 71 In article <23326@bu-cs.BU.EDU>, bzs@bu-cs.BU.EDU (Barry Shein) writes: > I think the proper question is sort/merging a disk farm and doing 1000 > transactions/sec or more while keeping 8 or 12 tapes turning at or > near their rated 200 ips, not pushing bits thru a single channel The hard part of this is getting enough disk throughput to feed even one of those 200-ips tape drives. The rest is replication. Channels sound like essentially moving the disk driver into an I/O processor, with lists of channel control blocks being analogous to lists of struct buf's. This makes it feasible to do more optimizations, even real-time stuff like scatter-gather, chaining, and rotational scheduling. Barry mentions the UDA-50 as being similar. But its processor is an 8085, and DMA speed is only 0.8 MB/s, making it much slower than a dumb controller. And the driver ends up spending as much time constructing the channel control blocks as it would spend tending a dumb controller like the Emulex SC7003. The Xylogics 450, Xylogics 472, and DEC TS11 are like this too. I find them all disappointingly slow. I suspect the real reason for channel processors is to reduce interrupts, which are so costly on big CPU's. It makes sense for terminals; people have made I/O processors that talk to Unix in clists (KMC-11's, etc) which cuts the total interrupt rate by a large fraction. But I don't think it's necessary, or necessarily desirable, to inflict this on disks & tapes, and certainly not unless the channel processor can talk in struct buf's. For all the optimizations that these I/O processors are supposed to do, Unix rarely gives them the chance. Unless there's more than two requests outstanding at once, once they finish one, there's only one request to choose from. Unix has minimal readahead, so that's as many requests as a single process can generate. Raw I/O is even worse. Asynchronous reads would be the obvious way to get enough requests in the queue to optimize, but that seems unlikely to happen. Rather, explicit read commands are giving way to memory-mapped files (in Mach and SunOS 4.0) where readahead becomes synonymous with prepaging. It remains to be seen whether much attention is put into this. Barry credits the asynchronous nature of I/O on mainframe OS's to the access methods, like RMS on VMS. People avoid those when they want speed (imagine using dbm to do sequential reads). For instance, the VMS "copy" command bypasses RMS when copying disk-to-disk, with the curious result that it's faster to copy to a disk than to the null device, because the null device is record-oriented, requiring RMS. As DMR demonstrates, parallel-transfer disks are great for big files. They're horrendously expensive though, and it's hard enough to find controllers that keep up with even 3 MB/s, much less 10 MB/s. But they can be simulated with ordinary disks by striping across multiple controllers, *if* the disks rotate as one. Does anyone know of a cost- effective disk that can phase-lock its spindle motor to that of a second disk, or perhaps with the AC line? With direct-drive electronically- controlled motors becoming common, this should be possible. The Eagle has such a motor, but no provision for external sync. I recall stories of Cray's using phase-locked disks to advantage. Of course, to get the most from high transfer rates, you need large blocksizes; DMR's example looked like about one revolution. Hence the extent-based file allocation of mainframe OS's, etc. Perhaps it's time to pester Berkeley to double MAXBSIZE to 16384 bytes? It would use 0.3% of memory for additional kernel page tables on a VAX, but proportionately less on machines with larger page sizes. 8192 is practically the *minimum* blocksize on Suns, these days. The one point that nobody mentioned is that you don't want the CPU copying the data around between kernel and user address spaces when there's a lot! (Maybe it was just too obvious). Don Speck speck@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck