Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!crdgw1!crdos1!davidsen From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) Newsgroups: comp.arch Subject: Re: Ignorance speaks loudest (was:Computers for users not programmers) Message-ID: <3200@crdos1.crd.ge.COM> Date: 14 Feb 91 14:49:28 GMT References: <1991Feb4.210853.22139@odin.corp.sgi.com> <1652@hpwala.wal.hp.com> Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen) Organization: GE Corp R&D Center, Schenectady NY Lines: 85 In article <4033@stl.stc.co.uk> tom@nw.stl.stc.co.uk writes: | (i) buffering writes in the OS means I don't know when a transfer has | terminated. That's a real pain if I'm trying to maintain some sort of | consistency and recoverability for the data. True. Ordered write and wait for io cover this. Async i/o by itself doesn't help, and may hinder on this problem, since the order of completion of async i/o is indeterminate. You have to wait for completion (sync) at intervals. | (ii) buffering writes in the OS means an extra copy. Usually lots of extra | slave misses, waste cache space, etc, as well as waste store and bus bandwidth; | we can't always afford to throw away performance like this. What you say is true, as long as the "always" is noted. The price for async i/o is twofold; in code complexity and in that you have to buffer the data in the block size used by the hardware. There are cases where the hardware is marginal for the task, and this makes sense. In general programs may not find it convenient to do i/o in sector size, particularly since that means changing the code depending on which disk is being used, or using a fairly large buffer. It also should be noted that the overhead of the user moving stuff into a buffer is probably a lot higher than the library routines making a call and passing a buffer of fairly large size to the o/s. And that if the hardware supports it the buffer may not be copied, but simply dropped from the user's address space, to be copied and remapped if the user program accesses it before the i/o is complete. So in many cases the larger part of the overhead is still present. The summary of my answer to this is that the savings may not justify the effort, and that the copy by the o/s helps the realtime performance greatly, while using very little processor. I typically see 4-7% system cpu usage on my UNIX systems, so that's the upper bound on what I could save (on those systems). | (iii) Async io can make enormous differences in run time for io bound jobs - | bd's conventional "wisdom" is an example of the rubbish that so-called software | architects have foisted off on an unsuspecting world as knowledge. It's simple, | really: if I am reading from 5 different discs, I can overlap the seek times, | search times, and even channel times if they are split over two or more | channels. Without asynch IO I can't get any of this overlap - the parallelism | between discs, between controllers, and between channels has been stolen from | me. So async io gives me maybe a 5 times speed up fro this hypothetical | five disc job, if it's io bound. This is true if you are designing a system for running a single job which reads from five disks. That's not typical. By the time a system gets to be large enough to have five drive, it's unlikely to have only a single job running. While you could save something if you could get the whole resources of the machine, under load you would wait long enough for the i/o to be queued that you might not see any improvement at all. See below for comments on special cases. | (iv)The speed up for cpu bound jobs may be less spectacular, but it's there; | if some elapsed time is critical, a 1% reduction may matter. The argument is | crazy anyway: BD is suggesting that because I'm CPU bound I can afford to use | synchronous IO and let the CPU stand idle while I wait for some macroscopic | mechanical events - chuck away some of precisely that resource I'm short of. You can always come up with a special case to justify anything. Vendors will sell you solutions to these special cases, but why try to convince people that the solutions to atypical problems need to be thrust upon the average user? This was originally a discussion of the hypothetical shortcomings of UNIX i/o, and since there are versions which allow async i/o, and versions which have lightweight processes, and versions which allow shared memory (so a another process can issue the i/o into your buffers), I think solutions are clearly available for UNIX. I spent a decade programming on systems which did async i/o, and all the users used routines which made it look just like traditional UNIX, because non-programmers don't think clearly about multiple threads. Sure, when I wrote the device interface I had three printers, a punch, and a card reader, with a total of six buffers for all of them, and smooth shift from double buffering to single buffering as the number of devices in use increased, but the users didn't know or care about it. And even the kernel hackers thought it was a bit complex ;-) -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "I'll come home in one of two ways, the big parade or in a body bag. I prefer the former but I'll take the latter" -Sgt Marco Rodrigez