Newsgroups: comp.sys.nsc.32k Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!indetech!daver!bungi.com!news From: Bruce Culbertson Subject: Re: "Advanced MINIX"? Date: Wed, 15 May 91 10:29:59 pdt Message-ID: <9105151729.AA25218@hplwbc.hpl.hp.com> Sender: news@daver.bungi.com Approved: news@daver.bungi.com Distribution: world > Is anyone around here working on a fix for the file manager? Excuse my > ignorance concerning Minix on the 532, I just received and installed the > software on a PC (soon I will get around to mailing the disk to Bruce). > I've noticed this "single threaded file system" produces terrible response > time in a multiprogramming environment. Has anyone decided to attack this > problem? If not, that may be my first task. This topic comes up frequently. While I am not working on an FS fix, I have thought a lot about it and I even wrote a few paragraphs which describe how I would like to see FS fixed. Sometime I would like to implement my FS fix but if you find time to take a crack at it before I do, by all means go for it! Bruce Culbertson ---------------------------------------------------------------- FS works on one system call at a time (except for reads and writes to pipes and tty's). So, suppose cc forks and execs cc1. MM sends a read to FS to load the ~300K binary. The read takes several seconds. Meanwhile, suppose you are in an editor or shell which uses RAW mode (what, you're not using ed and sh?). No characters you type will be echoed during the several seconds that it takes to load cc1. This is annoying and does not make efficient use of our computing resources. A lot of people have suggested a multi-threaded FS as a solution to this problem. I am very much opposed to this. I -do- think FS has a problem and I would like to see it fixed (and, by the way, MM has a similar problem). However, "multi-threaded" is a particular solution to the problem and I think it is the wrong solution. There are a lot of ways to provide concurrency and to protect resources against concurrent access. Unix lets multiple threads execute the OS code and uses "sleeping on addresses" to protect its resources. Unix's scheme works, of course, but it is by no means the only way to build an OS. Minix uses a different scheme: it protects its resources by letting just one thread have access to each of them. (Granted, there are some exceptions.) There is nothing wrong with Minix's concurrency control mechanism; the problem occurs at a higher level in the FS code. FS divides the work it needs to do into pieces which are too big. For example, FS treats a read system call as a single indivisible piece of work which it processes to completion before making progress on other system calls. This piece of work is too big because FS may need to block in the middle of the read to wait for a disk operation. When FS blocks for a disk operation, it does not make progress on other work it could be doing. A read can be thought of as a bunch of sub-tasks: the work that needs to be done before the first disk operation, the work that needs to be done between disk operations, and the work that is needed after the last disk operation. Between each of these sub-tasks, FS needs to check to see if there is other work it could be doing. Currently, FS does not check for other work and this is the problem. I think there is a way to fix the FS problem which does not require a major rewrite of FS, does not introduce any new and redundant concurrency control mechanisms, and minimizes the number of obscure new ways for Minix to deadlock. My solution is patterned after the mechanism which allows CISC computers to interrupt string instructions. CISC computers execute each iteration of a string instruction as though it were the first iteration. All pointers and counters are updated and written back to the CPU registers at the end of each iteration. This makes it possible to interrupt the string instructions without a ton of special case microcode to save the state of the instruction. On return from an interrupt, the processing of a string instruction is the same regardless of whether some iterations have already executed. This idea could be applied to FS. Suppose FS needs to process a read. FS would save a copy of the system call message in its process table. The message is sufficient to maintain the state of the processing of the system call. In general, a sequence of disk blocks must be accessed to process the read. First, assume one of the blocks is present in the buffer cache. Data is copied from the buffer cache to the user's space and the count and pointers are updated in the copy of the system call message. If the count becomes zero, the user process can be restarted by sending it a reply message. Otherwise, the read system call can be placed on the end of a queue of work which FS is working on. On the other hand, suppose the block was not in the buffer cache. FS does a certain amount of processing before querying the cache and is many levels deep in nested subroutines when it discovers it must do a disk operation. When FS finds that the block is not in the cache, it first starts the disk operation by sending a message to the disk driver. Next, it places the read call on a queue of calls to be restarted when the block becomes valid. Finally, it longjmp's to pop all the subroutines off its stack, thereby forgetting that it has ever started processing the read call! The longjmp returns FS to its top level loop. When a message arrives indicating the disk operation is complete, the queue of calls waiting on the block is moved to the end of the queue of work which FS needs to do. The next time FS tries to process the read, it will find the block in the buffer cache, which is the case I described in the previous paragraph. Notice that when FS finds a read call on its queue of work to do, it does not need to know if it has already done some work on behalf of the call. It treats the call the same, regardless. This eliminates the need for a complicated state table and minimizes the changes which need to be made to FS. At first, it appears that this approach requires a lot of redundant work. For example, FS fetches the inode of the file before each block is read. In fact, it is necessary to repeat this processing, regardless of the FS implementation, because FS must check that another user has not removed the file in the middle of the processing of the read call. The new top-level FS loop would look like this: while (1) { while (work queue is not empty) process work on queue; receive (Message); place Message at end of work queue; } While I have left out some of the details for this FS modification, I hope it is clear that it does not change the majority of the code. It adds a few fields to the data structures in the buffer cache and FS process table. It requires some changes to FS's top level loop and it requires changes at the point where the current FS does a send-receive to block for a disk operation. With careful coding, it might even be possible to make the changes unobtrusive enough to overcome Dr. Tanenbaum's objections to fixing FS.