Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!munnari.oz.au!sirius.ucs.adelaide.edu.au!chook!gordoni From: gordoni@chook.adelaide.edu.au (Gordon Irlam) Newsgroups: comp.sys.encore Subject: Umax 4.3 virtual memory problem Keywords: Umax, paging, cow, bug Message-ID: <1479@sirius.ucs.adelaide.edu.au> Date: 18 Sep 90 06:03:54 GMT Sender: news@ucs.adelaide.edu.au Reply-To: gordoni@chook.adelaide.edu.au (Gordon Irlam) Lines: 231 Nntp-Posting-Host: chook.ua.oz.au Some Notes on Umax 4.3 Virtual Memory Performance ================================================= aka "How to sell overpriced memory expansion boards." Gordon Irlam, Adelaide University. (gordoni@cs.adelaide.edu.au) 1990 September 18 Umax 4.3 release 4.0.0, and all previous releases of BSD Umax, contain a serious bug in the virtual memory system that prevents it from being able to page out pages of processes under certain commonly occurring circumstances. This degrades system performance. Or equivalently increases the amount of physical memory needed to obtain a given level of performance. In more extreme cases it may cause severe performance problems or even deadlock. Umax 4.3 is not able to page out copy on write pages. The meaning of this and its ramifications are explained below. 1) What is copy on write memory? -------------------------------- Umax 4.3 has a fairly sophisticated virtual memory subsystem. Although not as sophisticated as that of Mach or SunOS. In such systems virtual memory is used in a very lazy fashion. Pages are shared between processes whenever possible, and pages are only duplicated when strictly necessary. Such systems page unmodified pages directly from the file system, and only modified data pages need to be paged to and from a swap partition. When a process forks under Umax all of the modifiable pages of the parent process are marked copy on write. The same set of pages are marked copy on write in the child process. Because code pages are read only they can be shared without being marked copy on write. Marking a page copy on write means setting its protection to read only, and then if a write to that page causes a translation fault a copy of the page is made, the protection on the page is set to read-write, and the faulting instruction re-executed. Copy on write pages minimize the cost of forking. If when a copy on write fault occurs the copy on write page is no longer shared with any other processes, say because the child has exited, the page will be set to read-write without the needing to make a copy of the page. Note that this final giving away of a copy on write page is not performed as soon as the page becomes owned by a single process, but only when the last owner of the page writes to it. If the last owner never writes to the page it will remain copy on write despite the fact that it is not shared with anyone else. 2) An example of the problem. ----------------------------- Consider the following small C program. ---- start example.c ---- #define PAGE_SIZE 4096 #define MEGABYTE 1000000 #define SIZE (10*MEGABYTE) static char space[SIZE]; /* 10M of zero-filled memory. */ main() { int i; for (i = 0; i < SIZE; i += PAGE_SIZE) /* Touch pages to get them in core. */ space[i] = 'x'; if (fork () != 0) { /* Fork. */ while (1) {} /* Child exits, parent doesn't. */ } } ---- end example.c ---- This process gets 10 megabytes of data in core by touch all the pages, otherwise they would be marked as zero fill, and would not have been created yet. It then forks. In forking all of the modifiable pages are marked copy on write. Because copy on write pages can not be swapped out (even if the child has exited, and the parent is the sole owner of them), the net result is a process occupying 10 megabytes of non-pageable memory. Only if the process modifies the pages, reading them is not sufficient, will they cease to be copy on write, and become eligible for paging. Needless to say it doesn't require too many copies of this program to be run before Umax starts thrashing severely, or even deadlocks. The same effect could have been observed in the child process if the child hadn't exited, or if the while loop was replaced by other computations, or system calls, except obviously exec. 3) Implications for real systems. --------------------------------- Fortunately many processes, 1) do not fork, or 2) fork but have a reasonably small amount of data, or 3) shortly after forking both child and parent, a) exit, or b) exec, or c) modify nearly all their data pages, or 4) only access a few pages data pages, immediately prior to forking, and then only read a few data pages at any time subsequent to forking. Those cases where these constraints are not met cause the most problems, and to a certain extent case 4 can also cause problems. In case 4 where a process only touches a few pages immediately prior to forking, if the system was heavily loaded at the time prior to the fork, most pages will have been swapped out, and so will not end up being locked down by the fork - unless they are subsequently read in. But if the system was lightly loaded at the time of the fork then case 4 will still cause a large number of pages to be locked down. The program shown above was an extreme example of the problems that non-pageable copy on write memory can cause, however all programs that fork will cause problems to a certain extent. This prevents Umax from being able to run processes whose total virtual memory size significantly exceeds the amount of physical memory available, even though such processes may be idle most of the time. Our experience is that we can not use much more swap space than twice the physical memory on our machines, even though many of our processes are idle for substantial periods of time. We had considerable difficulty when we attempted to use a Multimax as a server for a large number of X terminals. The machine had sufficient compute power, virtual, and physical memory for the clients, but nearly all of the physical memory filled up with non-pageable copy on write pages, that weren't even being used. Unfortunately the xterm binary was both long lived and caused a large number of pages to be locked down for long periods of time. 4) Fixing the problem. ---------------------- Identifying the problem is fairly easy. Sysparam will be showing the system paging heavily, but when you do a ps you will find some pages of processes remain in memory, even when they are idle or stopped. In more severe cases all of the system's memory may end up becoming non-pageable, preventing you from even being able to login. Unless you have enough money to afford some extra memory that is effectively unused, there is little you can about this problem other than be aware of it and try and manage your job mix accordingly. If you are desperate however you could try applying a fix similar to the one we applied to one or two of the programs that caused us the most trouble, as outlined below. I would recommend avoiding this if at all possible. We reported this bug to Encore around the end of April, so hopefully they are aware of the problem and are working on a solution. We have not yet received a reply from Encore. But I believe that the problem warrants making available a new version of /Umax.image to those sites that need it once it has been solved. I thought this problem was sufficiently serious to bring it to the attention of others. It's a pity Encore doesn't - it would be useful if Encore posted to the net details of serious bugs when they are first discovered, and made a more complete list of bugs available for anonymous ftp. 5) Caveat. ---------- It is true that this message is critical of Umax. But this doesn't mean that I think Umax is a poor operating system. On the contrary, all things considered, I think Umax is quite good. In particular I believe Encore have been very successful in parallelizing the BSD kernel. 6) A nasty little hack. ----------------------- The following routine can be called to make a process's pages pageable. To use it you will need sources to the programs you wish to fix. It works by writing to all of a process's data pages so that they become exclusively owned. Obviously this has performance ramifications since it increases the amount of swap space used, and most likely the amount of swap traffic that will occur. This routine should be called from those branches of a process that have just forked and are not about to exit or exec. It is possible that one or two pages on the top of the stack may not get modified by this routine, and will remain non-pageable. ---- start touch_pages.c ---- #define PAGE_SIZE 4096 #define DATA_START 0x400000 #define STACK_LIMIT 0xffffff000 #define FLOOR(p) ((char *) (((int) p) & ~ (PAGE_SIZE - 1))) int zero() { return (0); /* This function is used to fool optimizers. */ } touch_pages() { char stack_start; int nothing; char *start, *limit, *p; nothing = zero(); start = FLOOR(DATA_START); /* Modify data and bss pages. */ limit = FLOOR((int) sbrk(0) + PAGE_SIZE - 1); for (p = start; p < limit; p += PAGE_SIZE) *p = *p + nothing; start = FLOOR(&stack_start); /* Modify stack pages. */ limit = FLOOR(STACK_LIMIT); for (p = start; p < limit; p += PAGE_SIZE) { *p = *p + nothing; } } ---- end touch_pages.c ----