Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!wuarchive!decwrl!shlump.nac.dec.com!shodha.dec.com!alan From: alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) Newsgroups: comp.unix.ultrix Subject: Re: Any disk de-fragmenters out there? Summary: Fragmentation of the Fast File System. Message-ID: <504@shodha.dec.com> Date: 1 Dec 89 18:48:18 GMT References: <2095@compugen.> <7862@bunny.GTE.COM> Distribution: comp Organization: Digital Equipment Corp. - Colorado Springs, CO. Lines: 110 In article <7862@bunny.GTE.COM>, krs0@GTE.COM (Rod Stephens) writes: > I was at a DECUS seminar where someone asked how disk fragmentation > effected performance on Unix systems. The answer was that the file > system works best if things are NOT contiguous. (This started a series > of jokes about disk fragmenters ;) Unfortunately that's all I know. > Does anyone know what this guy meant? First you should refer to the article "A Fast File System for UNIX*" by Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler and Robert S. Fabry. A copy is in the System Manager volume of the ULTRIX+ Supplementary Documents. ULTRIX uses the Berkeley Fast File System (FFS) for local disk accesses. Many other UNIX systems also use FFS, but some probably still use the old UNIX file system. Among the features of the FFS are: o Optimal storage utilization (blocks and fragments). o File system parameterization (rotation layout). o A different block layout policy. There are others, but these are the ones that reduce the affects of fragmentation in the file system. The first allows a large allocation block size for files (4KB and 8KB in the ULTRIX implementatin). Left to itself though this would waste large amounts of space when many small files are created. To reduce the amount of wasted space small files are allowed to allocate fragments of blocks. Fragment sizes of 1/8th, 1/4th, 1/2 and the block size are allowed. Both blocks and fragments are allocated contiguously, so for files whose sizes are less than or equal to the block size fragmentation isn't a problem. The 2nd feature attempts to layout file system blocks at rotationally optimal locations for the capability of the disk. A number of file system parameters are provided for tuning the file system to work best for the disk it is on(1). The optimal location for a block is calculated based on the rotational speed of the disk and the time it takes the system to dispatch an I/O request to the disk. A simple example of this is might a file that consists of two 8KB blocks. Even if the file system is doing read-ahead it will take two reads to read the files (one for each 8KB block). If the blocks are allocated contigously it is possible the 2nd block will have rotated past the disk head before the request gets to the disk and so you'll have to wait for the block to come back around. If a gap is placed between the blocks that is long enough to allow the 2nd request to show up, the request can be satisfied more quickly. If the disk/controller hardware allows it, it is possible to specify long strings of blocks that can be read/written contiguously(1). The affect of rotational optimization on fragmentation is that files are already fragmented in such a way to allow for optimal sequential access at the file system block size. Depending on the disk speed, controller speed and latency and CPU speed the best layout to have these rotational gaps or it may be best to layout the blocks as contiguously as possible. These is some- thing that you may have to determine by experimentation for your hardware and file system access. The third feature is an attempt to keep directories of files close together and spread the free space equally across the disk. The disk is divided into groups of cylinders, where each group is treated like a small file system. It has a copy of the superblock, a section of inodes and free space. When a new directory is created it is placed in the cylinder group with the most free space. Files created in that directory are allocated to the same cylinder group. In order to try and keep a reasonable amount of free space in the cylinder group large files are limited to the amount of space they can use out of one cylinder group and are allocated to other groups. If a cylinder group is full a quadratic hash is used to find space and if that fails an exhaustive search is performanced. Performance studies of the FFS at Berkeley showed that the performance was fairly constant until the file system reach around 90% full. This is the reason that the file systems attempts to keep 10% free. This threshold can be adjusted with tunefs(8), but if it is going to be long term situation you should attempt to find more free space somewhere. One potential disadvantage of the block layout is that files get scattered all over the disk. The files in a given directory may be close together, but two different directory (two users for example) may be far part. To help get around this Berkeley added request sorting to the disk drivers so that when the disk queues became full the requests would be served in such a way to get the best global through put from the disk. The Digital Storage Architecture (DSA) controllers also do request sorting. In ULTRIX-32 V1.0 the DSA (UDA50) driver still had the Berkeley sort routine in it. It was removed in V1.1 in the belief that there was no need to sort the requests twice. I believe that most of what I have written is accurate, but I haven't the FFS article recently so my memory may be faulty. Any corrections would be appreciated. *UNIX is a trademark of AT&T. +ULTRIX is a trademark of Digital Equipment Corporation. 1. Refer to the tunefs(8) manual page. > > Rod Stephens -- Alan Rollow alan@nabeth.enet.dec.com