Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!mips!mash From: mash@mips.com (John Mashey) Newsgroups: comp.protocols.nfs Subject: Re: Incremental sync()s and using disk idle time Message-ID: <848@spim.mips.COM> Date: 10 Mar 91 05:12:01 GMT References: <1991Mar7.115154.4820@hq.demos.su> <1991Mar8.142031.9098@bellcore.b Sender: news@mips.COM Organization: MIPS Computer Systems, Inc. Lines: 126 Nntp-Posting-Host: winchester.mips.com In article kinch@no31sun.csd.uwo.ca (Dave Kinchlea) writes: ... >Actually I have been having quite the oposite thoughts lately. It seems to me >that it would be highly advantagous (in the general case) to take all of >the filesystem information out of the kernel and give it to the I/O controller. >I don't just mean what requests are satisified first (although this would >be one of its tasks) but one which also supports an abstract filesystem. >This would take out alot of logic from the kernel, it needn't spend time >through namei() et al. let an intellegent controller do that. > >Am I missing something important here, other than the fact that no operating >system I am aware of has a concept of an abstract filesystem (except at >the user level). There is still some logic needed re virtual memory and >possible page-outs etc but I think it could work. Any comments? It's been done; I doubt that I'd do it again, personally. It think there are only a few design points where this makes sense. Let's see the reasons why this often doesn't make sense, based on the standard sort of system partitioning analysis: 1) Where is the data? 2) Where are the processors and how fast are they? 3) How frequent and large are the data movements needed between storage(sw) and processor(s)? So, consider some of the common cases: 1) 1 CPU, main memory, disk controller(s) with no significant local storage (other than track caches, for example). This is pretty straightforward: CPU does all of the work. If what you're doing is a lot of file thrashing, that uses close to 100% of the CPU; if there isn't much, then almost 100% of the CPU can be use for other things. Buffer cache usage is pooled with other memory usage, (assuming any of the current UNIXes that do dynamic buffer cache sizing that adapts to mix of buffer cache versus other uses.) 2) CPU, main memory, disk controller that runs the filesystem. Two cases of interest: 2a) Disk controller CPU is substantially slower than main CPU. Now, compared to the previous case: The CPU gains back some of the time spent doing filesystem stuff. It ends up spending some of its time synchronizing with the disk controller CPU, because the interactions with the controller are much more frequent than if only disk I/Os need interactions. (This can be seen from sar-type statistics, if you compare #'s of logical operations/nameis/stats, etc, with physical I/Os.) For example, every namei goes to the controller, every stat, fstat. (I mention these because at least some of them sometimes do very little movement of data, compared with the overhead of setting them up. The controller CPU must also have access to the memory maps used by the CPU (after all, read/write system calls easily cross page boundaries) and either the CPU must do substantial checking before handing over the request, or else the controller CPU needs to request the main CPU to diddle page table entries for it at appropriate times, initiate page-ins, etc. You need to build a different memory system to avoid unnecesary interference, compared to the previous case. (in the previous case, USUALLY, most of the accesses from controller <-> main memory are for moving data to/from disk (or track cache). There are some accesses to I/O control information, but this should be a small fraction of the amount of data transferred. Hence, one can use block transfers fairly often, whihc minimizes memory interference. For this case, the disk controller must frequently rummage around in main memory. Maybe these accesses will fit block transfers, maybe not....If they don't, both CPUs will suffer. FINALLY, and this is the worst part, if the disk controller CPU is substantially SLOWER than the main CPU, all of this can easily run slower than the main CPU by itself, because you must wait until the controller completes some action. (I have seen this happen more than once in real life.) SO: 2b) Use a disk controller that is as fast as the main CPU. This may be better, although the cost has now gone up substantially, for better memory busses, caches for the controller CPU, probably, etc. BUT NOW: You will need to make this work with more than 1 disk controller, (which will not be trivial). Even with 1 CPU and 1 controller, you have something that's looking (from a hardware viewpoint) more and more like a symmetric multiprocessor: 3a) Symmetric multiprocessor: shared memory, more than 1 CPU that can run the file system, and I/O controllers that look like the simple ones of case 1 above. So, suppose to avoid some of the problems of memory bandwidth and such, you say: 3b) The disk controller(s) have their own private memory(s). Now, if you study this case, you discover that you impact the main CPU(s) worse, because you end up with multiple transfers from the controller memory into main memory (again, look at sar data to convince yourself of this). Even worse, with multiple controllers, you now have a fixed amount of buffer cache per controller, which generally does not work as well as 1 large memory pool that can hold whatever is currently be used, given equal total amounts of memory. I think the bottom line is: the only times I've ever seen this kind of thing work, was when: the main CPU was out of gas, and one could more easily make a smart I/O processor help offload it, and the nature of the restructing was that: a) The controller CPU could convert numerous small interactions with a device, into much less frequent, but larger interactions with the CPU's main memory. b) The CPU/controller interaction was fairly minimal, or the controller provides a more efficient performance model without changing the programming model, especially for operating systems that don't already have disk caching built-in so strongly as UNIX. (For instance, mainframe disk caching controllers, including battery-backed up memory might fit here.) However, as a bottom line for UNIX systems, it often seems that you start with dumb controllers, but if you keep makingthem smarter, they rapidly get to being symmetric multis .... with dumb controllers again... -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086