Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!philabs!seismo!hao!hplabs!sri-unix!sdyer@bbn-unix From: sdyer%bbn-unix@sri-unix.UUCP Newsgroups: net.unix Subject: Re: Raw vs. block device. I'm confused. Message-ID: <15192@sri-arpa.UUCP> Date: Thu, 5-Jan-84 19:31:48 EST Article-I.D.: sri-arpa.15192 Posted: Thu Jan 5 19:31:48 1984 Date-Received: Mon, 9-Jan-84 00:15:10 EST Lines: 48 From: Steve Dyer Reading and writing on a disk block device participate in the kernel's buffer cache. That is, data transfers occur between the user's address space and the buffers in the buffer cache, possibly implying that no I/O was performed immediately (i.e. on a read the buffer might have already been present in the cache, and on a write, the actual I/O request would be enqueued, but not yet performed.) Note that when the number of bytes to be transferred is greater than the UNIX system's buffer size, BSIZE (usually 512 or 1024), the single request given by the user program must be broken up into multiple requests to fill a system buffer. "Raw" disk I/O occurs directly between the user program and the hardware device, bypassing any buffering. Raw I/O is faster than "cooked" I/O for two reasons: first, since data is DMA'ed directly into the user's address space, one avoids the CPU overhead of having to copy bytes to/from an intermediate buffer. More importantly, when performing disk operations like "?check", "fsck" or a disk-to-disk copy, all of which need to read multiple contiguous physical blocks, it is often possible (depending on the controller) to read multiple sectors in a single DMA operation. The same I/O request on the block device would have to be split into several operations, almost certainly losing revolutions between successive requests. Adb'ing the raw disk device doesn't work because of physio(), the mediator of raw "dma-type" requests. Physio() hands to the disk device strategy routine the "block number" of the request. The block number is derived quite simply as u.u_offset>>BSHIFT. u.u_offset is the current "lseek" position of the open raw device file, BSHIFT is log2(BSIZE). Thus, all RAW I/O operations must occur on a BSIZE boundary. (Now only MUST, but DO! It's quite surprising the first time you attempt raw I/O on a non-BSIZE boundary and find that you've trashed the beginning of the block!) Adb, like most UNIX programs, simply lseeks to the desired spot and starts writing. Think about it. The primitive writable object on the surface of a disk is a sector, which is usually 512 bytes. To write on a disk device at other than a sector boundary would require reading the old sector into memory, modifying it, and writing it out again, something the raw device cannot do, but which the block device handles quite well, since its higher levels have already taken care of that. Now, you might ask why physio() truncates at BSIZE rather than SECTORSIZE (since they are no longer, since V7, one and the same.) I suspect it's merely a convenience, saving an extra manifest constant to keep track with reality. /Steve Dyer sdyer@bbncca decvax!bbncca!sdyer