Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ll-xn!ames!oliveb!sun!gorodish!guy From: guy%gorodish@Sun.COM (Guy Harris) Newsgroups: comp.bugs.4bsd,comp.unix.wizards Subject: Re: concurrent write(2) calls write bad data to file Message-ID: <14589@sun.uucp> Date: Fri, 6-Mar-87 07:32:06 EST Article-I.D.: sun.14589 Posted: Fri Mar 6 07:32:06 1987 Date-Received: Sun, 8-Mar-87 05:45:51 EST References: <692@rtech.UUCP> Sender: news@sun.uucp Reply-To: guy@sun.UUCP (Guy Harris) Organization: Sun Microsystems, Mountain View Lines: 70 Xref: mnetor comp.bugs.4bsd:206 comp.unix.wizards:1242 >This bug appears to exist only on 4.2-derived systems. Well, I don't know about that. You see, it's like this: Process A does a "write" call. It grabs the current value of the file pointer and uses it as the write offset. It then locks the inode and goes in to write stuff. The write requires a new block to be allocated. This may require I/O to be done; assume it does. The process blocks waiting for the I/O to complete, and process B gets scheduled. Since process A's "write" hasn't finished, the file pointer has NOT been updated. It grabs the same offset value that process A got. It can't write yet, though, because the inode is locked. So it waits. Process A now finishes its I/O and finishes the "write". It unlocks the inode and updates the file pointer by adding the number of bytes it wrote. Now assume that process A gives up the processor as soon as it returns from the kernel, and process B gets the processor. It now proceeds to write *its* data *on top of* the data that process B wrote. It unlocks the inode, and returns, adding the number of bytes *it* wrote to the file pointer. Thus, the file pointer moves by the sum of the number of bytes processes A and B wrote. However, only the maximum of the two byte counts was actually written to the file. The file pointer now points some number of bytes *past* the last byte written; the next "write" will write at that location, leaving behind a hole filled with - you got it - zeroes. This is borne out by 1) the fact that in a test case I ran (the test program was modified so that the parent counted *down* rather than *up*, so that the parent and child would be more likely to be writing different numbers of bytes), it clearly looked like the two processes both tried to write a record to the *same* location in the file - a location that started on a 512-byte boundary - and that the zeroes followed this scrambled record and 2) the fact that when I changed the program to put the file descriptor in forced-append mode (so that the writes *never* overlap) the problem went away. I don't see any obvious reason why this *couldn't* happen on any UNIX system that didn't lock the file table entry while a write was in progress, and no system I've worked with does so. It may be that due to the vagaries of the scheduler, and the amount of I/O done when extending a file in small chunks, and things like that, it's *less likely* to happen on a system using the V7 file system, but I don't see that it's impossible on such a system. In short, the problem is that UNIX has never been able to guarantee that the file pointer is always valid; it's invalid while an I/O operation is "in progress", but nothing prevents a process from using the file pointer's value while it isn't valid. The solution is something like "use file locking" or "use forced append mode" or "use something else that will keep a process from using the file pointer value while a 'write' is in progress," assuming you can arrange that. >I think I'm also running into a variant of this problem involving >spurious nulls being written to a pipe when a signal occurs at just >the wrong time, and another pipe write is done in the signal handler. Not likely in 4.2BSD, since pipes don't go through the file system, but go through the socket code.