Path: utzoo!dciem!array!colin
From: colin@array.UUCP (Colin Plumb)
Newsgroups: comp.arch
Subject: Re: Mixing paging and IO is inefficient (was Re:  Compiler partions)
Message-ID: <103@array.UUCP>
Date: 4 Jul 90 00:25:36 GMT
References: <499@garth.UUCP> <5660@titcce.cc.titech.ac.jp> <137770@sun.Eng.Sun.COM>
Organization: Array Systems Computing, Inc., Toronto, Ontario, CANADA
Lines: 65

In article <137770@sun.Eng.Sun.COM> lm@sun.UUCP (Larry McVoy) writes:
>I'm really getting sick of this thread.  Those who understand file system
>semantics dismissed this idea as flawed from the start.  The synchronous
>nature of certain file system writes are *required* for file system
>reliability.

Depending on the failure modes you're considering, the Unix file
system simply isn't reliable.  If you assume that block writes are
atomic, then enforcing sequencing will prevent thoroughly bogus
file system structures, although protection violations (file
gets extended, new pointer gets added to inode, system crashes
before block which used to contain unencrypted passwords gets
overwritten) certainly are.

> Just so you understand: consider what happens when you create
>a file.  You allocate an inode and add a directory entry.  Think about the
>steps and the order of operations.  If you do it wrong, and the system
>crashes, you leave dangling pointers.  Before you tell me that systems don't
>crash very often and this isn't a problem, think back to what things were
>like when we all knew how to use fsdb.  (If you never knew fsdb, you have no
>business in this discussion).  The reason that this isn't a problem anymore
>is that we fixed it.  You are suggesting that we undo the fix.  That might
>be acceptable in your environment (How is that ETA, anyway, it's been a
>while since I've logged on) but it is not acceptable for most customers.
>Most sites want both performance and robustness - if there is a conflict
>they will sacrifice performance for robustness.

In that particular case, if I'm allowed to assume that inodes are
marked "unused" when deleted, the order doesn't matter.  On
power-up fsck, I can either see an unreferenced inode (If I
allocate, then update the directory), or a directory entry pointing
to a free inode (which I can either abort, by removing the
directory entry, or commit by marking the inode as in use.)
Adding a data block to a directory might be a better example.

/tmp is a special case, I hope you'll admit - it routinely gets cleaned
out at reboot, anyway.  If you arrange for vi backup files to go
somewhere else, you can just mkfs on every reboot if you feel like it.

>Trying to solve this problem in the manner described in Ohta's Usenix paper 
>is a mistake.  It's an inapproriate solution to the problem.

I think it's a wierd hybrid of a ramdisk and stable storage, and I
think it's ugly, but I can't say it's *wrong*.

>The way that this problem is solved is via hardware.  I'll go out a limb
>and predict what the disk drive of the future looks like:  Every drive will
>have some non volatile memory, into which go all writes.  (The size of the
>memory is derivable from file system traces  - since I/O is consistantly
>bursty, the memory has to be big enough to handle a burst.)  If the system
>crashes, the drive could care less, it keeps dribbling out the writes.
>If power fails, the drive finishes the writes when it is powered back up.

I'd certainly like the NVRAM approach, as it's nice and fast,
but...

>If you must have a software solution now, I'm afraid that you are stuck with
>the tmpfs method of doing business.

See "Reimplementing the Cedar File System Using Logging and Group
Commit", Robert Hagmann, Proc. 11th ACM Symp. on Operating Systems
Principles, also known as ACM Operating Systems Review vol.21 no.5
(1987).  You don't need extra hardware.
-- 
	-Colin