Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!clyde.concordia.ca!nstn.ns.ca!news.cs.indiana.edu!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!ncar!ames!pioneer.arc.nasa.gov!lamaster From: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: Ignorance speaks loudest (was:Computers for users not programmers) Message-ID: <1991Feb14.195906.5726@news.arc.nasa.gov> Date: 14 Feb 91 19:59:06 GMT References: <1991Feb4.210853.22139@odin.corp.sgi.com> <1652@hpwala.wal.hp.com> <3200@crdos1.crd.ge.COM> Sender: usenet@news.arc.nasa.gov (USENET Administration) Reply-To: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) Organization: NASA Ames Res. Ctr. Mtn Vw CA 94035 Lines: 68 In article <3200@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: >In article <4033@stl.stc.co.uk> tom@nw.stl.stc.co.uk writes: >| (iii) Async io can make enormous differences in run time for io bound jobs - > This is true if you are designing a system for running a single job >which reads from five disks. That's not typical. By the time a system >gets to be large enough to have five drive, it's unlikely to have only a >single job running. While you could save something if you could get the Actually, jobs like this are very, very typical in our environment. As CPU speed increases, all those CPU bound jobs turn into I/O bound jobs (not an original thought on my part :-) ). Your second observation usually applies during the daytime. Often at night, even modest minicomputers and servers are used to run long jobs. Often, these jobs are I/O bound. > You can always come up with a special case to justify anything. True. But, this is not really a question of a special case. Anyone using a system for "data processing" applications, as opposed to "text processing", terminal/screen/user-interface/etc. processing, runs up against the fact that disks are, and have always been, slow, when doing *random* I/O. It is amusing to see that what were always considered the traditional uses of computers are now considered "special cases". But, that is only true if you approach computing from a text-processing viewpoint. What is, in fact, happening, is that Unix machines, once considered suitable *only* for text processing, have now "grown up", and are being used for data processing as well: including scientific data processing, image processing, transaction processing, general DBMS uses, etc. For many of these purposes, asynch. I/O is important. > This was originally a discussion of the hypothetical shortcomings of >UNIX i/o, and since there are versions which allow async i/o, and I agree. There are now versions of Unix which have async I/O. These versions were made available because a lot of mainstream computer users needed it. More versions of Unix will include async I/O in the future. (Is this prophecy?) > I spent a decade programming on systems which did async i/o, and all >the users used routines which made it look just like traditional UNIX, >because non-programmers don't think clearly about multiple threads. Your users must be different from our users. Many of our users have always been intensely interested in optimizing programs for both CPU time and I/O time, and have often gone to considerable lengths to write their programs to be very efficient wrt I/O, including the use of async I/O as appropriate. ******* Fault tolerance, an O/S issue or an Architecture issue? ************ To follow up on Jim Giles original posting: It is an open question as to whether what used to be called "recovery of rolled jobs", "user checkpointing", etc. really makes sense anymore. It was a good idea when the MTBF of a CPU was four hours, and we had on-site C.E.'s to fix the hardware in a hurry. With MTBF's of *months* on many systems, I'm not sure it is a good idea. How many people frequently have a long running job die in a hardware related crash now? How many of those jobs hadn't modified any files yet? There are a lot of subtleties to doing it right. And, to provide *real* fault tolerance, if you need it, *is* a major hardware/software/architectural issue. Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster NASA Ames Research Center Internet: lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 With Good Mailer: lamaster@george.arc.nasa.gov Phone: 415/604-6117