Path: utzoo!attcan!uunet!wuarchive!cs.utexas.edu!yale!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.unix.internals Subject: Re: On the silliness of close() giving EDQUOT Message-ID: <12045:Oct2604:56:3290@kramden.acf.nyu.edu> Date: 26 Oct 90 04:56:32 GMT References: <9681:Oct2004:06:3090@kramden.acf.nyu.edu> Organization: IR Lines: 96 In article thurlow@convex.com (Robert Thurlow) writes: > In <9681:Oct2004:06:3090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >In article thurlow@convex.com (Robert Thurlow) writes: > >> In <24048:Oct1822:23:2090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >> >["I don't mind close returning -1/EINTR and -1/EBADF."] > >> >No other close() errors make sense. > >> So how do you pick up errors on asynchronous writes? > >That is an excellent question. I suspect that if UNIX had truly > >asynchronous I/O, my objections would disappear, as the whole system > >would work much more cleanly. Somehow I'm not sure the latest mound of > >P1003.4 is the right way to approach asynchronous I/O, but anyway... > >``What asynchronous writes?'' > I have them by my definition - "my can process have control before my > data has been committed to permanent storage". What definition are you > using, and why don't you feel my write(2) calls are asynchronous? I was asking the same question several months ago, before Larry McVoy and Jim Giles explained to me what I was missing. Frankly, I didn't find their explanations too elucidating at first, so here's the scoop. ``Asynchronous I/O'' can mean two different things. One is buffered I/O: output or input that is cached on its way to or from the disk. This is what UNIX has always had. It doesn't imply multiprocessing, or a change in programming philosophy, or a different interface from that of the synchronous, unbuffered I/O in a primitive OS. For all you know, your data might never leave the buffers---or buffers might not be used at all. The advantage of buffers is that they reduce turnaround time: you don't block waiting for data to go to disk or to another process, as long as it fits into a buffer. Another is truly asynchronous I/O: reads or writes that happen asynchronously, concurrently with your program. The right way to see the difference between synchronous and asynchronous I/O is to look at the lowest level of I/O programming. A synchronous write has the CPU taking time off from executing your program. It copies bytes of data, one by one, into an output port. An asynchronous write has a separate I/O processor doing the work. Your CPU takes only a moment to let the I/O processor know about a block of data; then it returns to computation. The CPU wouldn't run any faster if there were no I/O to do. UNIX has never let truly asynchronous I/O show through fully to the programmer. Although any real computer does have some sort of I/O processor doing asynchronous reads and writes at the lowest levels, UNIX sticks at least one level of buffering between programs and this asynchronicity. Disk-to-buffer I/O synchronicity would have no more impact on programs than any other scheduling problem. Truly asynchronous I/O---without buffering---involves a change in programming style. Since the data is not copied by the CPU, your process has to know when it's safe to access that area of memory. This implies that processes have to see a signal when the I/O really finishes. In other words, truly asynchronous I/O is much closer to the level of the machine, where scheduling I/O and waiting for a signal is the norm. I hope this clears up what I mean when I say that UNIX doesn't have asynchronous I/O. (Btw, I'm finishing up a signal-schedule, aka non-preemptive threads, library. Anyone want to see particular features in it? It won't give you asynchronous I/O without kernel support, but it'll provide a framework for easing async syscalls into your code.) > > [ I object that programs can't afford to keep data around in case of ] > > [ possible problems; errors must be returned in a timely fashion ] > >> This is ridiculous. If a program wants to _know_ the data is secured, > >> it can call fsync before it frees the buffers or overwrites the data. > >I sympathize with what you're trying to say, but have you noticed that > >fsync() isn't required to flush data over NFS, any more than write() is > >required to return EDQUOT correctly? If write()'s errors aren't > >accurate, I don't know how you expect fsync() to work. > Our fsync, like Suns, ensures there are no pages in the VM system > marked as "dirty", and it does this by forcing and waiting for I/O > on each such page. The I/O involves an NFS write, and any I/O errors > are detected. Are you sure? Suppose the remote side is a symbolic link to yet another NFS-mounted directory. Is the fsync() really propagated? This begs the real question: Why should I have to waste all that traffic on periodic fsync()s, when the traffic for timely EDQUOT detection would be a mere fraction of the amount? I can't afford to buffer everything and do just an fsync() before the final close(). > >> "Allocations"? I won't lightly put *any* state into my NFS server, never > >> mind state to take care of frivolities like close returning EDQUOT. > >No good remote file system is stateless. I think every complaint I've > >heard about NFS is caused by the ``purity'' of its stateless > >implementation. > No doubt, but I appreciate the advantages of the simplicity this allows. Minor advantages at best. > When it is clear what state we need to introduce to make a more robust > implementation, it'll probably happen. I hope so. ---Dan