Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!lll-lcc!styx!ames!ucbcad!ucbvax!sdcsvax!darrell From: darrell@sdcsvax.UUCP Newsgroups: mod.os Subject: Who needs files. [Really "Apollo's been doing it for 6 years"] Message-ID: <2906@sdcsvax.UCSD.EDU> Date: Mon, 23-Mar-87 12:51:51 EST Article-I.D.: sdcsvax.2906 Posted: Mon Mar 23 12:51:51 1987 Date-Received: Sat, 28-Mar-87 05:07:31 EST Sender: darrell@sdcsvax.UCSD.EDU Lines: 61 Approved: mod-os@sdcsvax.uucp Jim Rees's article (which may or may not have been posted yet) gives the outlines of how Apollo uses the single-level store idea to implement its distributed filesystem. Something else to note is that we did run into the problem that the mapped I/O model did present performance problems to the sequential file access level. The kernel knows nothing about sequential I/O and open files and such and originally did not do any read-ahead. Later (a while ago now), we added "touch ahead": Mapped segments (32K regions of virtual address space) can be marked with an integer that is the number of pages the pager should read in when any page in the segment gets faulted on. This give read-ahead of a sort. Sequential output was a bit trickier. There is no parallel to page faulting that occurs when you're "done" writing through a page. What we did do was make it so that segments can be marked "flush-behind"; i.e. the physical memory pages should be treated as good candidates for re-use as soon as the segment mapped over them gets unmapped from the virtual address space. Another change we made to make sequential output better is "grow-ahead": Note that when you're writing a new file, the pages that are mapped in do not correspond to real disk pages until they are touched. Touching one of these pages is called a "growth fault". The system now optionally grows the file by more than one page on growth faults. The Stream I/O library takes care of truncating off any extra pages when the stream is closed. All in all, mapped I/O is a nice and useful idea, but anyone who thinks that it will perform just like a traditional sequential I/O system withOUT any special purpose features like the ones described above is whistling in the dark. Also, some of the features don't always work like you'd think/hope. To deal with cache consistency, the file locking mechanism is tied into the remote file caching mechanism. Although it is possible to bypass the locking mechanism and map a file withOUT locking, this is strongly discouraged (i.e. not a documented feature and not used by the Stream I/O library). When a node locks a file, it contacts the home node of the file and gets back the current date-time-modified (DTM) for the file. It uses this value to determine whether any pages the using node has are still OK (i.e. whether it can avoid re-reading the pages from the home node). When the using node unlocks the file, dirty pages are sent back to the home node before the lock is released. If you use the mechanisms as intended, you NEVER get bad (stale) data. We consider this property a necessity. Another thing to remember about the single-level store: Don't be seduced. I.e. don't use it for things for which it is not intended. I.e. just because you can access some database file all over the network doesn't mean that you should implement a DBMS this way. You probably really want to use RPC. The problem with using a file system in the "wrong" way is that the interface to file systems is just generally not designed to deal with failure -- e.g. to letting you know just how much of your I/O succeeded when the network partitioned. Things can be made even worse in a mapped I/O system since any random memory reference can cause an exception to be raised. (At least when you make "normal" filesystem calls, you generally get an error returned. You can simulate this with mapped I/O, but not always cleanly and faithfully.) -- Nat Mishkin Apollo Computer Inc. -------