Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!uflorida!novavax!hcx1!hcx3!gwp From: gwp@hcx3.SSD.HARRIS.COM Newsgroups: comp.unix.wizards Subject: Re: Record-access libraries (with q Message-ID: <48300016@hcx3> Date: 19 Oct 88 17:54:00 GMT References: <287@cvbnet2.UUCP> Lines: 44 Nf-ID: #R:cvbnet2.UUCP:287:hcx3:48300016:000:2539 Nf-From: hcx3.SSD.HARRIS.COM!gwp Oct 19 13:54:00 1988 Written 4:21 pm Oct 17, 1988 by jc@minya (John Chambers) >> If you access the raw disk device do you disable that read-ahead and >> write-behind aspect of the UNIX filesystem abstraction? > Oh, wow! A question with a simple answer: Yes. According to several > manuals, the main difference between /dev/dsk* and /dev/rdsk* is that > there is no buffering for the latter. Reads always delay for physical > I/O, and writes always go immediately to disk (though with DMA, the > write may not be complete when write() returns). There's also a > warning that the raw disks should be only accessed in multiples of > a sector. In fact, most programs use multiples of BUFSIZ, which > is invariably a multiple of a sector. Maybe this is obvious, but you have to keep in mind that there is also no "file-system" with a raw disk device. I mention this because I have seen a number of database programs that directly read and write to and from the raw disk (for performance/safety reasons) then turn around at some later time and access that information through a file-system (for convenience). To do this the database kernel did all sorts of system specific manipulations to mesh with the "invisible" file system before performing their raw I/O. This all struck me as rather stupid because you can disable at least the write behind portion of the buffer cache by specifying O_SYNC when opening the file (at leat you can under System V). > The exact wording in one of the manuals describes the "'raw' interface > which provides for direct transmission between the disk and the user's > read or write buffer. A single read or write call results in exactly > one I/O operation and therefore raw I/O is considerably more efficient > when many words are transmitted." Note the specific claim that the > transfer is direct between the disk and the buffer in user space, > without going through a kernel buffer. Not to get into any wild plugging but we've worked out a method for doing the same thing with a mounted file system i.e. transferring the data directly from the users adddress space to the disk without going throught the kernel buffer. Interestingly enough the main performance gain with this method doesn't come from avoiding the buffer copying but more from the fact that you can do single transfers of up to 256K rather than 32 individual transfers of 8K (our block size). Of course this assumes the ability to lay out 32 disk blocks contiguously. Gil Pilz -=|*|=- Harris Computer Systems -=|*|=- gwp@ssd.harris.com