Path: utzoo!attcan!uunet!wuarchive!cs.utexas.edu!yale!cmcl2!kramden.acf.nyu.edu!brnstnd
From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: comp.unix.internals
Subject: Re: On the silliness of close() giving EDQUOT
Message-ID: <12045:Oct2604:56:3290@kramden.acf.nyu.edu>
Date: 26 Oct 90 04:56:32 GMT
References: <thurlow.656303314@convex.convex.com> <9681:Oct2004:06:3090@kramden.acf.nyu.edu> <thurlow.656468483@convex.convex.com>
Organization: IR
Lines: 96

In article <thurlow.656468483@convex.convex.com> thurlow@convex.com (Robert Thurlow) writes:
> In <9681:Oct2004:06:3090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >In article <thurlow.656303314@convex.convex.com> thurlow@convex.com (Robert Thurlow) writes:
> >> In <24048:Oct1822:23:2090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >> >["I don't mind close returning -1/EINTR and -1/EBADF."]
> >> >No other close() errors make sense.
> >> So how do you pick up errors on asynchronous writes?
> >That is an excellent question. I suspect that if UNIX had truly
> >asynchronous I/O, my objections would disappear, as the whole system
> >would work much more cleanly. Somehow I'm not sure the latest mound of
> >P1003.4 is the right way to approach asynchronous I/O, but anyway...
> >``What asynchronous writes?''
> I have them by my definition - "my can process have control before my
> data has been committed to permanent storage".  What definition are you
> using, and why don't you feel my write(2) calls are asynchronous?

I was asking the same question several months ago, before Larry McVoy
and Jim Giles explained to me what I was missing. Frankly, I didn't find
their explanations too elucidating at first, so here's the scoop.

``Asynchronous I/O'' can mean two different things. One is buffered
I/O: output or input that is cached on its way to or from the disk.
This is what UNIX has always had. It doesn't imply multiprocessing, or
a change in programming philosophy, or a different interface from that
of the synchronous, unbuffered I/O in a primitive OS. For all you know,
your data might never leave the buffers---or buffers might not be used
at all. The advantage of buffers is that they reduce turnaround time:
you don't block waiting for data to go to disk or to another process,
as long as it fits into a buffer.

Another is truly asynchronous I/O: reads or writes that happen
asynchronously, concurrently with your program. The right way to see the
difference between synchronous and asynchronous I/O is to look at the
lowest level of I/O programming. A synchronous write has the CPU taking
time off from executing your program. It copies bytes of data, one by
one, into an output port. An asynchronous write has a separate I/O
processor doing the work. Your CPU takes only a moment to let the I/O
processor know about a block of data; then it returns to computation.
The CPU wouldn't run any faster if there were no I/O to do.

UNIX has never let truly asynchronous I/O show through fully to the
programmer. Although any real computer does have some sort of I/O
processor doing asynchronous reads and writes at the lowest levels, UNIX
sticks at least one level of buffering between programs and this
asynchronicity. Disk-to-buffer I/O synchronicity would have no more
impact on programs than any other scheduling problem.

Truly asynchronous I/O---without buffering---involves a change in
programming style. Since the data is not copied by the CPU, your process
has to know when it's safe to access that area of memory. This implies
that processes have to see a signal when the I/O really finishes. In
other words, truly asynchronous I/O is much closer to the level of the
machine, where scheduling I/O and waiting for a signal is the norm.

I hope this clears up what I mean when I say that UNIX doesn't have
asynchronous I/O. (Btw, I'm finishing up a signal-schedule, aka
non-preemptive threads, library. Anyone want to see particular features
in it? It won't give you asynchronous I/O without kernel support, but
it'll provide a framework for easing async syscalls into your code.)

> >  [ I object that programs can't afford to keep data around in case of ]
> >  [ possible problems; errors must be returned in a timely fashion ]
> >> This is ridiculous.  If a program wants to _know_ the data is secured,
> >> it can call fsync before it frees the buffers or overwrites the data.
> >I sympathize with what you're trying to say, but have you noticed that
> >fsync() isn't required to flush data over NFS, any more than write() is
> >required to return EDQUOT correctly? If write()'s errors aren't
> >accurate, I don't know how you expect fsync() to work.
> Our fsync, like Suns, ensures there are no pages in the VM system
> marked as "dirty", and it does this by forcing and waiting for I/O
> on each such page.  The I/O involves an NFS write, and any I/O errors
> are detected.

Are you sure? Suppose the remote side is a symbolic link to yet another
NFS-mounted directory. Is the fsync() really propagated?

This begs the real question: Why should I have to waste all that traffic
on periodic fsync()s, when the traffic for timely EDQUOT detection would
be a mere fraction of the amount? I can't afford to buffer everything
and do just an fsync() before the final close().

> >> "Allocations"?  I won't lightly put *any* state into my NFS server, never
> >> mind state to take care of frivolities like close returning EDQUOT.
> >No good remote file system is stateless. I think every complaint I've
> >heard about NFS is caused by the ``purity'' of its stateless
> >implementation.
> No doubt, but I appreciate the advantages of the simplicity this allows.

Minor advantages at best.

> When it is clear what state we need to introduce to make a more robust
> implementation, it'll probably happen.

I hope so.

---Dan