Xref: utzoo comp.lang.misc:5416 comp.unix.wizards:23632 Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!uwm.edu!rpi!sci.ccny.cuny.edu!phri!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.lang.misc,comp.unix.wizards Subject: UNIX does *not* fully support asynchronous I/O Message-ID: <11576:Aug2503:18:3790@kramden.acf.nyu.edu> Date: 25 Aug 90 03:18:37 GMT References: <126800008@.Prime.COM> <60345@lanl.gov> <1990Aug21.223350.7595@esegue.segue.boston.ma.us> Distribution: usa Organization: IR Lines: 41 X-Original-Subject: Re: buffering, was Query In article <1990Aug21.223350.7595@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes: > In article <60345@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > >From article <126800008@.Prime.COM>, by EAF@.Prime.COM: > >> If your language I/O library is intelligent and you are reading sequential > >> data, the language library will call on the OS to read the next disk > >> block into memory, often before it is required. > >Not on UNIX it won't. There is no system call for the library to use ... [ John talks about simple caching schemes ] I'm afraid Jim is right, though he drastically overestimates the effect of this failure on small machines. Let me explain. Say a program computes some numbers. Computes them optimally, in fact, leaving them in an array. Now it wants to write the array to disk. If the operating system weren't in the way, the program would simply call upon the disk device to copy the data---through DMA, of course---to the disk. Under UNIX, there's at least one big extra step. write(fd,buf,n) first *copies* the data to a buffer inside the kernel's space. This takes CPU time. Do you see now what Jim is complaining about? Of course, on most machines disk transfer is much slower than CPU transfer, so once you've gotten rid of the disk seek by caching, any further asynchronicity is silly. But Jim works with very fast disks, and a lot of them at once. mmap() is a partial solution: it does its job well and gets rid of the extra step, but doesn't fit into the ``UNIX model'' as well as it could. How do you use mmap() on a pipe, for example? If two programs are communicating via a pipe, they should be able to write data and read it with *zero* copies in the middle. Under standard UNIX, there are two extra copies at least: one for read() and one for write(). I've proposed a solution: make a call analogous to writev() that uses the iovecs directly. Introduce another call that says whether a particular iovec has been written or not. Also introduce a way to wait on this status, similar to select(). Similarly for reading. ---Dan