Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!decvax!harpo!seismo!hao!hplabs!sri-unix!v.wales@ucla-locus From: v.wales%ucla-locus@sri-unix.UUCP Newsgroups: net.unix Subject: Re: Raw vs. block device Message-ID: <15255@sri-arpa.UUCP> Date: Fri, 6-Jan-84 14:11:37 EST Article-I.D.: sri-arpa.15255 Posted: Fri Jan 6 14:11:37 1984 Date-Received: Fri, 13-Jan-84 03:55:43 EST Lines: 138 From: Rich Wales Jonathan -- Here is an attempt on my part to describe "block" and "raw" I/O in as much detail as reasonably possible. If I have inadvertently made some misstatement, or left out some important feature, I trust one of the other "veterans" on this list will correct me. UNIX has two kinds of device interfaces: "block", and "character" (also called "raw"). I'll discuss here the "raw" interface first, since it is the "lower-level" of the two, and since virtually all devices with block interfaces will have a raw interface as well. RAW (CHARACTER) DEVICE INTERFACE Generally speaking, the "raw" interface to a device gives you direct control over that device. If you do a "read" system call on a disk via the "raw" interface, for example, you will generally invoke a single input operation on that disk to read your data. (There may be exceptions here; for example, I once wrote a "raw" device driver for an RX02 floppy disk, and since this device can read or write only one sector at a time, I implemented long "read" or "write" re- quests via multiple I/O commands to the drive.) Raw I/O is "synchronous": I/O operations are always done in the order requested. There can never be more than one raw I/O request pending per device. In 4.1BSD, this restriction is generally imple- mented by having the driver declare a single "buf" structure per device for all raw I/O on that device. All raw I/O for the device goes through a routine called "physio" (in dev/bio.c); "physio" in turn checks and manipulates a "busy" status bit in the "buf" struc- ture, using the kernel's "sleep"/"wakeup" facility to force requests on a busy "buf" structure to wait. Raw I/O is generally subject to any requirements imposed by the hardware itself. For example, if a given disk demands (as most do) that all I/O operations start on a sector boundary and comprise an integral number of full sectors, then you must observe this restric- tion when doing raw I/O on that disk. If you do try to read/write random amounts of data at random places on a disk via a raw interface, you are likely to get unpredictable results. (In particular, a misaligned "write" is liable to trash innocent data.) If the driver is well written and checks for this situation, you may get an explicit error, but you shouldn't in gen- eral depend on this. This, by the way, is why you can't use "adb" on a raw device. In the case of my RX02 driver which I mentioned earlier, by the way, I chose to implement multi-sector "read"s and "write"s as a conve- nience to the user. I could have forbidden them (because the RX02 hardware doesn't support them) and have been perfectly within the philosophy of raw I/O interfaces by so doing. My driver still re- quired all transfers to start on sector boundaries and comprise an integral number of full sectors, though -- and I explicitly tested for violations of this constraint before doing the I/O. Raw I/O on terminal lines is somewhat complicated by the use of the "clist" mechanism (see sys/prim.c). Hence, terminal I/O may be to some extent asynchronous, even though a "raw" interface is in use. BLOCK DEVICE INTERFACE The block interface (if one exists) to a device goes through a com- plicated buffering/caching scheme. A number of buffers (each one 1024 bytes long in 4.1BSD, or 512 bytes long in Version 7) are allo- cated by the kernel for block I/O. Each buffer is labelled with the device (major/minor) and block numbers, so that repeated references to the same block do not result in actual "read" operations if the block is already in main memory. Each buffer has a "dirty" bit, so that the data is not written back to disk immediately upon the issuance of a "write" system call. Data is written back when the buffer is needed for another block (LRU caching strategy); when a "sync" system call is issued by a process; or when a block device is closed and (if it was mounted) unmounted. A "block" driver interface to a device is free to perform I/O opera- tions in any order it sees fit -- not necessarily the order in which "read" or "write" system calls were issued. (Hence, while raw I/O is "synchronous", block I/O is "asynchronous".) Most disk drivers use a queue of pending I/O requests for each drive, sorted in order by cylinder so as to allow the disk arm to sweep back and forth across the surface in "elevator" fashion. In a "raw" interface, on the other hand, there is no need for a queue of pending requests, since by definition only one raw I/O request can ever be pending for any given device. The buffering scheme allows you to do I/O with arbitrary byte off- sets and byte counts, even if the device itself does not support such access. For example, if you want to write a single byte in the middle of a block using the block interface, the kernel will read in the entire block and then change the single byte in question. An I/O operation which spans multiple blocks (perhaps starting in the middle of one block and ending in the middle of another) is handled in a similar fashion. The block I/O mechanism is used by the routines which implement reg- ular file I/O, needless to say. WHICH DEVICES ARE BLOCK? WHICH DEVICES ARE RAW? In general, every device will have a raw interface. Additionally, a device on which it would make sense to put a file system (i.e., disks) will generally have a block interface. Most tape drivers also have a block interface, although I have never had occasion to access a tape by anything but the raw interface. If you are doing a "dd" (byte-for-byte copy) of a large area of disk (say, for example, that you are moving a file system from one part of the disk to another), you should probably use the raw interface, since it is far more efficient than the block interface. In partic- ular, large block sizes in "dd" can generally be handled by the raw disk interfaces, whereas the block interface will cut a large trans- fer down into 1K-byte chunks. Terminals have only a raw interface. Also, such "funny" files as /dev/null and /dev/kmem are implemented via raw interfaces. (Of course, you can still do I/O on /dev/kmem from random offsets and with random byte counts, since memory does not have the alignment restrictions that a disk does.) DEVICE SPECIAL FILES AND RAW VS. BLOCK I/O There are two kinds of device special files in UNIX: raw and block. The major device number of a device special file is associated with a set of device-driver routines via one of two tables in dev/conf.c: "cdevsw" ("c" = "character" = "raw") for raw devices, and "bdevsw" ("b" = "block") for block devices. In particular, note that there is no necessary relationship whatsoever between "raw" major device number N and "block" major device number N. I hope this covers your question adequately. If not, let me know and I will try and supply additional information. -- Rich