Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!crdgw1!crdos1!davidsen
From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr)
Newsgroups: comp.arch
Subject: Re: Ignorance speaks loudest (was:Computers for users not programmers)
Message-ID: <3200@crdos1.crd.ge.COM>
Date: 14 Feb 91 14:49:28 GMT
References: <1991Feb4.210853.22139@odin.corp.sgi.com> <1652@hpwala.wal.hp.com>
Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen)
Organization: GE Corp R&D Center, Schenectady NY
Lines: 85

In article <4033@stl.stc.co.uk> tom@nw.stl.stc.co.uk writes:

| (i) buffering writes in the OS means I don't know when a transfer has
| terminated.  That's a real pain if I'm trying to maintain some sort of
| consistency and recoverability for the data.

  True. Ordered write and wait for io cover this. Async i/o by itself
doesn't help, and may hinder on this problem, since the order of
completion of async i/o is indeterminate. You have to wait for
completion (sync) at intervals.

| (ii) buffering writes in the OS means an extra copy. Usually lots of extra
| slave misses, waste cache space, etc, as well as waste store and bus bandwidth;
| we can't always afford to throw away performance like this.

  What you say is true, as long as the "always" is noted. The price for
async i/o is twofold; in code complexity and in that you have to buffer
the data in the block size used by the hardware. There are cases where
the hardware is marginal for the task, and this makes sense. In general
programs may not find it convenient to do i/o in sector size,
particularly since that means changing the code depending on which disk
is being used, or using a fairly large buffer.

  It also should be noted that the overhead of the user moving stuff
into a buffer is probably a lot higher than the library routines making
a call and passing a buffer of fairly large size to the o/s. And that if
the hardware supports it the buffer may not be copied, but simply
dropped from the user's address space, to be copied and remapped if the
user program accesses it before the i/o is complete. So in many cases
the larger part of the overhead is still present.

  The summary of my answer to this is that the savings may not justify
the effort, and that the copy by the o/s helps the realtime performance
greatly, while using very little processor. I typically see 4-7% system
cpu usage on my UNIX systems, so that's the upper bound on what I could
save (on those systems).

| (iii) Async io can make enormous differences in run time for io bound jobs -
| bd's conventional "wisdom" is an example of the rubbish that so-called software
| architects have foisted off on an unsuspecting world as knowledge. It's simple,
| really: if I am reading from 5 different discs, I can overlap the seek times,
| search times, and even channel times if they are split over two or more
| channels. Without asynch IO I can't get any of this overlap - the parallelism
| between discs, between controllers, and between channels has been stolen from
| me.  So async io gives me maybe a 5 times speed up fro this hypothetical 
| five disc job, if it's io bound.

  This is true if you are designing a system for running a single job
which reads from five disks. That's not typical. By the time a system
gets to be large enough to have five drive, it's unlikely to have only a
single job running. While you could save something if you could get the
whole resources of the machine, under load you would wait long enough
for the i/o to be queued that you might not see any improvement at all.

  See below for comments on special cases.

| (iv)The speed up for cpu bound jobs may be less spectacular, but it's there;
| if some elapsed time is critical, a 1% reduction may matter. The argument is
| crazy anyway: BD is suggesting that because I'm CPU bound I can afford to use
| synchronous IO and let the CPU stand idle while I wait for some macroscopic
| mechanical events - chuck away some of precisely that resource I'm short of.

  You can always come up with a special case to justify anything.
Vendors will sell you solutions to these special cases, but why try to
convince people that the solutions to atypical problems need to be
thrust upon the average user?

  This was originally a discussion of the hypothetical shortcomings of
UNIX i/o, and since there are versions which allow async i/o, and
versions which have lightweight processes, and versions which allow
shared memory (so a another process can issue the i/o into your
buffers), I think solutions are clearly available for UNIX.

  I spent a decade programming on systems which did async i/o, and all
the users used routines which made it look just like traditional UNIX,
because non-programmers don't think clearly about multiple threads.
Sure, when I wrote the device interface I had three printers, a punch,
and a card reader, with a total of six buffers for all of them, and
smooth shift from double buffering to single buffering as the number of
devices in use increased, but the users didn't know or care about it.
And even the kernel hackers thought it was a bit complex ;-)
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
  "I'll come home in one of two ways, the big parade or in a body bag.
   I prefer the former but I'll take the latter" -Sgt Marco Rodrigez