Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!zaphod.mps.ohio-state.edu!tut.cis.ohio-state.edu!pt.cs.cmu.edu!MATHOM.GANDALF.CS.CMU.EDU!lindsay
From: lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay)
Newsgroups: comp.arch
Subject: Re: Shared Memory
Keywords: stable storage
Message-ID: <7939@pt.cs.cmu.edu>
Date: 12 Feb 90 01:03:49 GMT
References: <81.25d07596@waikato.ac.nz> <13910015@hpisod2.HP.COM>
Organization: Carnegie-Mellon University, CS/RI
Lines: 51


In article <81.25d07596@waikato.ac.nz> ccc_ldo@waikato.ac.nz writes:
>In <7695@pt.cs.cmu.edu>, lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay)
>mentions, as a drawback of shared memory, the example where a commit log
>for reliable transactions might be written to battery-backed memory
>instead of to disk. The idea was that, by complicating the instruction
>sequence necessary to do the write, you lessen the chance of a wild program
>corrupting the log.
>
>Lindsay says this works, but "only partially", as the wild program can
>still corrupt the log by calling the subroutine that knows how to write to 
>it. It seems to me the same objection applies to an on-disk log.

In article <13910015@hpisod2.HP.COM> dhepner@hpisod2.HP.COM 
	(Dan Hepner) writes:
>Reasonably designed software would almost certainly encounter a
>failed assertion before actually corrupting the log.

Yes, accessing disks is complicated, so that's not the difference.

I think that the disk is fundamentally more reliable, because it's
slow.  Let me explain by sketching a "stable storage" scheme.

Assume that a data record R fits in a disk sector, and that when you
read it back, there is enough checksumming (at whatever levels) so
that trashed data will be known as such. Dedicate a block of three
disk sectors: call them {A,B,C}. They will be read after reboots.

Write is done by filling a buffer with {R,0,R} and doing a
three-sector write to {A,B,C}.  The "failure model" of the disk is
that a crash may corrupt one sector that is being written, or two,
but not three.

Reading R involves reading both copies. There are several
possibilities. For example, A and C may both checksum fine, but be
different.  In that case, we assume that the crash occured while
writing B, so crash recovery uses the record in A. 

This works fairly well. But why would it work better than the special
memory? I think it's because it's *slow*. The chance that the system
will be insane for duration D surely goes down as D increases. 

This is bad news, because the whole point of having the memory was to
speed things up.

Of course, one way out of this bind is to claim that power failures
are the dominant field failure, and that a good power-fail trap can
make systems die in nanoseconds. Kind of an easy answer: I'd love to
hear a better way out.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science