Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!mips!mash
From: mash@mips.com (John Mashey)
Newsgroups: comp.protocols.nfs
Subject: Re: Incremental sync()s and using disk idle time
Message-ID: <848@spim.mips.COM>
Date: 10 Mar 91 05:12:01 GMT
References: <1991Mar7.115154.4820@hq.demos.su> <1991Mar8.142031.9098@bellcore.b <KINCH.91Mar9170121@no31sun.csd.uwo.ca>
Sender: news@mips.COM
Organization: MIPS Computer Systems, Inc.
Lines: 126
Nntp-Posting-Host: winchester.mips.com

In article <KINCH.91Mar9170121@no31sun.csd.uwo.ca> kinch@no31sun.csd.uwo.ca (Dave Kinchlea) writes:
...
>Actually I have been having quite the oposite thoughts lately. It seems to me
>that it would be highly advantagous (in the general case) to take all of
>the filesystem information out of the kernel and give it to the I/O controller.

>I don't just mean what requests are satisified first (although this would 
>be one of its tasks) but one which also supports an abstract filesystem. 
>This would take out alot of logic from the kernel, it needn't spend time
>through namei() et al. let an intellegent controller do that. 
>
>Am I missing something important here, other than the fact that no operating
>system I am aware of has a concept of an abstract filesystem (except at 
>the user level). There is still some logic needed re virtual memory and 
>possible page-outs etc but I think it could work. Any comments?

It's been done; I doubt that I'd do it again, personally.
It think there are only a few design points where this makes sense.
Let's see the reasons why this often doesn't make sense, based
on the standard sort of system partitioning analysis:
	1) Where is the data?
	2) Where are the processors and how fast are they?
	3) How frequent and large are the data movements needed
	between storage(sw) and processor(s)?
So, consider some of the common cases:

1) 1 CPU, main memory, disk controller(s) with no significant local
storage (other than track caches, for example).
	This is pretty straightforward:
	CPU does all of the work.
		If what you're doing is a lot of file thrashing, that
		uses close to 100% of the CPU; if there isn't much,
		then almost 100% of the CPU can be use for other things.
	Buffer cache usage is pooled with other memory usage,
	(assuming any of the current UNIXes that do dynamic buffer
	cache sizing that adapts to mix of buffer cache versus
	other uses.)

2) CPU, main memory, disk controller that runs the filesystem.
Two cases of interest:
	2a) Disk controller CPU is substantially slower than main CPU.
	Now, compared to the previous case:
	The CPU gains back some of the time spent doing filesystem stuff.
	It ends up spending some of its time synchronizing with the
	disk controller CPU, because the interactions with the controller
	are much more frequent than if only disk I/Os need interactions.
	(This can be seen from sar-type statistics, if you compare
	#'s of logical operations/nameis/stats, etc, with physical
	I/Os.)  For example, every namei goes to the controller,
	every stat, fstat.  (I mention these because
	at least some of them sometimes do very little movement of data,
	compared with the overhead of setting them up.
	The controller CPU must also have access to the memory maps
	used by the CPU (after all, read/write system calls easily
	cross page boundaries)  and either the CPU must do substantial
	checking before handing over the request, or else the controller
	CPU needs to request the main CPU to diddle page table entries
	for it at appropriate times, initiate page-ins, etc.
	You need to build a different memory system to avoid
	unnecesary interference, compared to the previous case.
		(in the previous case, USUALLY, most of the accesses from
		controller <-> main memory are for moving data to/from
		disk (or track cache).  There are some accesses to I/O
		control information, but this should be a small fraction
		of the amount of data transferred.  Hence, one can use
		block transfers fairly often, whihc minimizes memory
		interference.
	For this case, the disk controller must frequently rummage around
	in main memory.  Maybe these accesses will fit block
	transfers, maybe not....If they don't, both CPUs will suffer.

	FINALLY, and this is the worst part, if the disk controller CPU
	is substantially SLOWER than the main CPU, all of this can easily
	run slower than the main CPU by itself, because
	you must wait until the controller completes some action.
	(I have seen this happen more than once in real life.)
SO:
	2b) Use a disk controller that is as fast as the main CPU.
	This may be better, although the cost has now gone up
	substantially, for better memory busses, caches for the
	controller CPU, probably, etc.
BUT NOW:
	You will need to make this work with more than 1 disk controller,
	(which will not be trivial).  Even with 1 CPU and 1 controller,
	you have something that's looking (from a hardware viewpoint)
	more and more like a symmetric multiprocessor:

3a) Symmetric multiprocessor: shared memory, more than 1 CPU that
can run the file system, and I/O controllers that look like the
simple ones of case 1 above.
So, suppose to avoid some of the problems of memory bandwidth and such,
you say:
	3b) The disk controller(s) have their own private memory(s).
Now, if you study this case, you discover that you impact the main
CPU(s) worse, because you end up with multiple transfers from
the controller memory into main memory (again, look at sar
data to convince yourself of this).
	Even worse, with multiple controllers, you now have a fixed
	amount of buffer cache per controller, which generally
	does not work as well as 1 large memory pool that can hold
	whatever is currently be used, given equal total amounts
	of memory.
I think the bottom line is: the only times I've ever seen this kind of
thing work, was when: the main CPU was out of gas, and one could
more easily make a smart I/O processor help offload it,
and the nature of the restructing was that:
	a) The controller CPU could convert numerous small interactions
	with a device, into much less frequent, but larger interactions
	with the CPU's main memory.
	b) The CPU/controller interaction was fairly minimal,
	or the controller provides a more efficient performance
	model without changing the programming model, especially
	for operating systems that don't already have disk caching
	built-in so strongly as UNIX.  (For instance, mainframe
	disk caching controllers, including battery-backed up memory
	might fit here.)

However, as a bottom line for UNIX systems, it often seems that
you start with dumb controllers, but if you keep makingthem smarter,
they rapidly get to being symmetric multis .... with dumb controllers
again...
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086