Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-unix!husc6!hao!ames!elroy!cit-vax!mangler From: mangler@cit-vax.Caltech.Edu (Don Speck) Newsgroups: comp.unix.wizards Subject: Re: Disk striping? (4.3 BSD) Message-ID: <5446@cit-vax.Caltech.Edu> Date: 15 Feb 88 09:19:12 GMT References: <2369@emory.uucp> Organization: California Institute of Technology Lines: 37 On December 10 I wrote that disk striping has a couple of rather serious restrictions. At the beginning of February I finally had a pressing need for disk striping (to piece together two small partitions into a usable-size filesystem after losing a disk), so I finally debugged the striping pseudo-device driver that I'd written, and found that neither restriction was necessary. The basic method is for the strategy routine to copy the buf, fudge the dev/blkno fields in the copy, and set B_CALL in the copy (NOT in the original). At iodone time, a routine is called, which copies back b_resid, b_error, and (only) the B_ERROR bit of b_flags, and does an iodone() on that. The temporary buf is then freed. To avoid the possibility of having to sleep on buf allocation, requests that cannot immediately allocate a buf are linked into a list. By having a private pool of bufs, we're assured that a buf will soon be freed up by an interrupt, and when that happens the list of waiting requests is examined. Ripping off swap buffers doesn't work, since the swapper may hog them and the only way it has to tell you when one becomes free is via sleep/wakeup, which strategy routines are NOT supposed to use. With those changes, it should be safe to use with Sun ND, etc. (I don't see why it couldn't be used recursively if you wanted). I tried various interleave factors, and found that with a single disk controller, it's best to interleave by cylinders. Trying to interleave by filesystem blocks messes up the rotdelay optimization. Reading large files does not go any faster than with a single disk, you only gain throughput if you have several independent readers. cit-vax has been using this to hold netnews since February 7. The code can be obtained by anonymous ftp from csvax.caltech.edu (10.1.0.54), file pub/stripe.tar. Feedback is welcome, this is still pretty experimental. Don Speck speck@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck