Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!usc!samsung!olivea!oliveb!amdahl!JUTS!rbw00 From: rbw00@ccc.amdahl.com ( 213 Richard Wilmot) Newsgroups: comp.arch Subject: Re: Incremental sync()s and using disk idle time Summary: Some applications sometimes need absolute control Message-ID: Date: 13 Mar 91 16:14:00 GMT References: <1991Mar8.142031.9098@bellcore.bellcore.com> <107340003@hpcuhc.cup.hp.com> Reply-To: rbw00@JUTS.ccc.amdahl.com ( 213 Richard Wilmot) Organization: Amdahl Corporation, Sunnyvale CA Lines: 90 In article <107340003@hpcuhc.cup.hp.com> dhepner@hpcuhc.cup.hp.com (Dan Hepner) writes: >From: rbw00@ccc.amdahl.com ( 213 Richard Wilmot) > >>I see some problems with transaction processing systems which rely on >>being able to absolutely control the timing of disk writes. Some (the >>more efficient ones only need do this for their logs/journals) while others >>want to flush out all changes made by a transaction and ensure that >>it all got there before sending a terminal reply or dispensing the >>ATM cash. > >All common DBMS SW rely on a need for notification when the write >is in fact completed, although they may be willing to write the log >immediately and the rest of it more asynchronously. That such >notification is not available seems to be a real deficiency in >the current crop of caching disk controllers. > > There may be more problems with the more efficient systems >>because although they don't insist on flushing out all database changes >>to disk on termination of each transaction, they RELY ON NOT HAVING ANY >>UNCOMMITTED (UNFINISHED) CHANGES WRITTEN TO DISK. That is, if the system >>crashed, then an advanced transaction system would expect to see NONE >>of the changes made by any incomplete transactions from before the crash. > >Agreed. >The drives we're familiar with do in fact support a synchronous access, >either by request or "setting the controller in that mode". For all >database usage, we intend that any controller caching be bypassed. This >does however leave an assumption that "somebody else" must be able to >make use of that caching, because OLTP sure can't. It's also worth >noting that a battery backed up controller cache might turn out to >be vastly more interesting. I was more worried about file systems than disk controllers, but disk controllers can be worrisome if they don't allow bypassing of cache functions or make the OLTP system pay a significant performance penalty for doing so. OLTP system performance is particularly sensitive to WRITE latency for logging (recovery journal information). This performance can be greatly augmented through appropriate use of non- volatile memory in the controller so that writes as well as reads can be cached. In fact it is then generally easier to cache writes than reads. > >>If a file system cannot accommodate this kind of use then the transaction >>system implementors will again be forced into using raw I/O - to >>avoid the file system. >>Alas, RAW I/O is still the answer for most database/transaction systems. > >This is perhaps the biggest trap of all. Using raw IO has nothing >to do with the behavior of a disk controller unless one has specifically >modified one's kernel to do something special, such as post all >raw writes as synchronous writes. The default behavior will be for >raw writes to be treated like any other write; the disk controller >doesn't know or care where this write came from. > >>They keep their own set of buffers and file structures. It need not be >>so if the file system incorporates the semantic needs of transaction/database >>systems. > >Do you actually recommend this, for which configurations, and for >which reasons? What I recommend is that file systems be constructed so as to support truly synchronous operation when required. Many file systems DO NOT REALLY SUPPORT SYNCHRONOUS I/O OR DO IT INAPPROPRIATELY. A synch operation which merely adds your request to a software queue to be done as soon as convenient does not solve the problem. Some I/O in some applications (e.g. Online Transaction Processing, OLTP) is crucial to correct system operation and interference with such requirements by the file system software or disk controller hardware/software will lead to not using it for those applications. I will consider the problem addressed when most OLTP/DBMS software vendors always use the operating system supplied file system. As another post from my organization notes, we are trying to provide the structure to allow those vendors to do just that. Other providers of file systems and/or disk controllers are well advised to consider these same needs. It will help customers who must implement and manage online transaction and databased systems. > >> Dick Wilmot | I declaim that Amdahl might disclaim any of my claims. >> (408) 746-6108 > >Dan Hepner >Not a statement of the Hewlett Packard Co. -- Dick Wilmot | I declaim that Amdahl might disclaim any of my claims. (408) 746-6108