Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!zaphod.mps.ohio-state.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!MATHOM.GANDALF.CS.CMU.EDU!lindsay From: lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) Newsgroups: comp.arch Subject: Re: Shared Memory Keywords: stable storage Message-ID: <7939@pt.cs.cmu.edu> Date: 12 Feb 90 01:03:49 GMT References: <81.25d07596@waikato.ac.nz> <13910015@hpisod2.HP.COM> Organization: Carnegie-Mellon University, CS/RI Lines: 51 In article <81.25d07596@waikato.ac.nz> ccc_ldo@waikato.ac.nz writes: >In <7695@pt.cs.cmu.edu>, lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) >mentions, as a drawback of shared memory, the example where a commit log >for reliable transactions might be written to battery-backed memory >instead of to disk. The idea was that, by complicating the instruction >sequence necessary to do the write, you lessen the chance of a wild program >corrupting the log. > >Lindsay says this works, but "only partially", as the wild program can >still corrupt the log by calling the subroutine that knows how to write to >it. It seems to me the same objection applies to an on-disk log. In article <13910015@hpisod2.HP.COM> dhepner@hpisod2.HP.COM (Dan Hepner) writes: >Reasonably designed software would almost certainly encounter a >failed assertion before actually corrupting the log. Yes, accessing disks is complicated, so that's not the difference. I think that the disk is fundamentally more reliable, because it's slow. Let me explain by sketching a "stable storage" scheme. Assume that a data record R fits in a disk sector, and that when you read it back, there is enough checksumming (at whatever levels) so that trashed data will be known as such. Dedicate a block of three disk sectors: call them {A,B,C}. They will be read after reboots. Write is done by filling a buffer with {R,0,R} and doing a three-sector write to {A,B,C}. The "failure model" of the disk is that a crash may corrupt one sector that is being written, or two, but not three. Reading R involves reading both copies. There are several possibilities. For example, A and C may both checksum fine, but be different. In that case, we assume that the crash occured while writing B, so crash recovery uses the record in A. This works fairly well. But why would it work better than the special memory? I think it's because it's *slow*. The chance that the system will be insane for duration D surely goes down as D increases. This is bad news, because the whole point of having the memory was to speed things up. Of course, one way out of this bind is to claim that power failures are the dominant field failure, and that a good power-fail trap can make systems die in nanoseconds. Kind of an easy answer: I'd love to hear a better way out. -- Don D.C.Lindsay Carnegie Mellon Computer Science