Path: utzoo!yunexus!ists!helios.physics.utoronto.ca!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!rice!sun-spots-request From: iapsd!hopi!glenn@uunet.uu.net (Glenn Herteg) Newsgroups: comp.sys.sun Subject: Re: SCSI & IPI rates Keywords: Hardware Message-ID: <8229@brazos.Rice.edu> Date: 29 May 90 09:47:54 GMT Article-I.D.: brazos.8229 Sender: root@rice.edu Organization: Sun-Spots Lines: 56 Approved: Sun-Spots@rice.edu X-Refs: Original: v9n183 X-Sun-Spots-Digest: Volume 9, Issue 185, message 2 lm@sun.eng.sun.com (Larry McVoy) writes: > Those drives have smart controllers and I believe they have zero > latency write ability so they don't have the problem of blowing > revs in between each write. It seems to me that, for most file uses, you don't *want* zero-latency writes. You'd like the performance, of course, but the downside is increased risk of damage should the machine crash. As I recall, adding "ordered writes" was a Feature of the first System V release, intended to add robustness to file system consistency should the machine crash during the file transfer. Early UNIX file systems often had many, many problems dredged up by fsck after a crash; these days, they're fairly rare, and I think this forced consistency has a lot to do with it. Databases, in particular, need some kind of write-ordering semantics to guarantee proper recording of transactions (isn't this what they call two-phase commit?). There was a Bell Systems Technical Journal article some years ago that discussed this issue and its relationship to UNIX file systems. > It is a requirement for correct operation that the data be on the > server's drive before the server says OK to the client. If this were > not so then you would be in serious trouble each time a server crashed. Exactly my point, but it doesn't just apply to NFS file systems. Frankly, given the number of bugs in UNIX software (yes, even SunOS 4.1 has its share [*]), system crashes (or hangs, with user-forced reboots) are still rather too common to ignore this issue of file system repair. It's certainly no fun poring through a massive fsck output listing and wondering how much ancillary damage you might incur by choosing the wrong order in which to choose to repair individual things that are bad in the file system data structures, especially, when you can't see the damage in enough detail to understand what *really* went wrong (and which files you'll need to restore from backup). And if you're not already a UNIX guru at the moment the machine crashes, you don't stand a chance of deciphering all that mumbledegook about inodes anyway. I suppose that, with the file system interfaces becoming more flexible, you might eventually be able to substitute your own kind of file system in a particular disk partition for recording bulk data (say, from a fast A/D converter), not caring about recovering such data in the event of a crash. Then you'd want to add ioctl()s (mount options) to the device driver to tell it when to and not to perform zero-latency writes, on a per-partition basis. Easy in theory, but you wouldn't want to endanger the rest of your file systems by insufficient testing of such capabilities. I'm curious about a related aspect of hard disk drives and device drivers, especially for SCSI devices for which you have this extra microcomputer embedded on the drive and interceding between you and your media. When you want to fsync() or msync(), has the data merely been written over the SCSI bus to the disk cache, or has it actually been written to the media, when the function call returns? Does this depend on the drive manufacturer, or is there some standard SCSI command used on *all* drives that probes for this kind of command completion? [*] Sun is already reporting patches to SunOS 4.1 -- including one for a problem in which file system blocks show up in files to which they do not belong! Just when will 4.1.1 be out? :-)