Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1a 12/4/83; site rlgvax.UUCP Path: utzoo!watmath!clyde!akgua!sdcsvax!sdcrdcf!hplabs!hao!seismo!rlgvax!guy From: guy@rlgvax.UUCP (Guy Harris) Newsgroups: net.unix-wizards Subject: Re: Undocumented features Message-ID: <1954@rlgvax.UUCP> Date: Mon, 28-May-84 15:35:10 EDT Article-I.D.: rlgvax.1954 Posted: Mon May 28 15:35:10 1984 Date-Received: Fri, 1-Jun-84 07:03:25 EDT References: <267@pcsbst.UUCP> <1942@rlgvax.UUCP> <518@opus.UUCP> <1948@rlgvax.UUCP> <1354@pegasus.UUCP> Organization: CCI Office Systems Group, Reston, VA Lines: 51 > When I saw the note about the file sync'ing undocumented feature, I thought > "Great! The people we have working on databases may have seen this, but if > they haven't, I'll pass the note on to them." They're replies: > > Person A: > > > > Person B will correct me if I'm wrong, but I believe this is a bit we had > > already heard about. It would be quite useful, except for one minor > > problem - it was put in for kernel use, and while it writes the data block > > synchronously, it does NOT write the inode before returning to the user. It > > was too bad, we thought we had found something useful before Person B spent > > a few hours on the phone to [the USG UNIX people]. > > > > Person B: > > > > Person A is correct about the utility (or lack thereof) of this feature. > > Thanks for the information though. > So BEWARE! If you use this "undocumented feature". There's a reason for it > being undocumented! This is correct. The problem is that "writei" calls "bwrite" instead of "bdwrite" if the FSYNC flag is set in the file descriptor, *but* that's not enough. If the B_ASYNC (asynchronous write - "bawrite") or B_DELWRI (delayed write - "bdwrite") flag is already set in the buffer, the write will be treated as an asynchronous or delayed write. For this to work, you'd have to clear both those flags in the buffer before "bwrite"ing it. The FSYNC bit is used only when writing superblocks in "update", and directory entries in "unlink" (to make sure the directory entry is reamed out before the inode it refers to is); since 1) you can't open a directory for writing and 2) you obviously aren't going to "fsck" a cooked device corresponding to a mounted file system, presumably that block of the file system will only be written with a "bwrite". Unfortunately, it ain't so; a "link" system call calls "wdir" which does a "bdwrite". This isn't a problem for "link", as the S5 "link" code makes sure the inode is written to disk before writing the directory entry, but could surprise a later "unlink" if that block remains unwritten in the cache with B_DELWRI on. 4.2BSD (and, I believe, 4.1BSD, whose file system is the same V7 file system as S3 and S5 use) flatly says "ALL WRITES TO DIRECTORY FILES WILL BE SYNCHRONOUS. PERIOD." As such, I'd vote for S5 turning off any B_ASYNC or B_DELWRI bits if the descriptor has the FSYNC flag set, and then making the FSYNC flag a documented and official bit with an O_FSYNC flag for "open" and "fcntl". The only side effect might be occasional performance degradation on directory I/O (less overlap), but more file system integrity *and* the ability to provide database integrity. A pretty good tradeoff, in my opinion. Besides, any such overlap due to a directory being (mistakenly) written with a "bdwrite" is an accident anyway. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy