Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!lll-winken!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhc!dhepner From: dhepner@hpcuhc.cup.hp.com (Dan Hepner) Newsgroups: comp.arch Subject: Re: Re: Incremental sync()s and using disk idle time Message-ID: <107340003@hpcuhc.cup.hp.com> Date: 12 Mar 91 01:55:13 GMT References: <1991Mar8.142031.9098@bellcore.bellcore.com> Organization: Hewlett Packard, Cupertino Lines: 55 From: rbw00@ccc.amdahl.com ( 213 Richard Wilmot) >I see some problems with transaction processing systems which rely on >being able to absolutely control the timing of disk writes. Some (the >more efficient ones only need do this for their logs/journals) while others >want to flush out all changes made by a transaction and ensure that >it all got there before sending a terminal reply or dispensing the >ATM cash. All common DBMS SW rely on a need for notification when the write is in fact completed, although they may be willing to write the log immediately and the rest of it more asynchronously. That such notification is not available seems to be a real deficiency in the current crop of caching disk controllers. There may be more problems with the more efficient systems >because although they don't insist on flushing out all database changes >to disk on termination of each transaction, they RELY ON NOT HAVING ANY >UNCOMMITTED (UNFINISHED) CHANGES WRITTEN TO DISK. That is, if the system >crashed, then an advanced transaction system would expect to see NONE >of the changes made by any incomplete transactions from before the crash. Agreed. The drives we're familiar with do in fact support a synchronous access, either by request or "setting the controller in that mode". For all database usage, we intend that any controller caching be bypassed. This does however leave an assumption that "somebody else" must be able to make use of that caching, because OLTP sure can't. It's also worth noting that a battery backed up controller cache might turn out to be vastly more interesting. >If a file system cannot accommodate this kind of use then the transaction >system implementors will again be forced into using raw I/O - to >avoid the file system. >Alas, RAW I/O is still the answer for most database/transaction systems. This is perhaps the biggest trap of all. Using raw IO has nothing to do with the behavior of a disk controller unless one has specifically modified one's kernel to do something special, such as post all raw writes as synchronous writes. The default behavior will be for raw writes to be treated like any other write; the disk controller doesn't know or care where this write came from. >They keep their own set of buffers and file structures. It need not be >so if the file system incorporates the semantic needs of transaction/database >systems. Do you actually recommend this, for which configurations, and for which reasons? > Dick Wilmot | I declaim that Amdahl might disclaim any of my claims. > (408) 746-6108 Dan Hepner Not a statement of the Hewlett Packard Co.