Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!decwrl!orc!bbn.com!lkaplan From: lkaplan@bbn.com (Larry Kaplan) Newsgroups: comp.arch Subject: Re: fork and preallocation (was Re: Paging page tables) Message-ID: <58307@bbn.BBN.COM> Date: 19 Jul 90 15:17:38 GMT References: <920@dgis.dtic.dla.mil> <5830@titcce.cc.titech.ac.jp> <5DL4SPD@xds13.ferranti.com> <5855@titcce.cc.titech.ac.jp> <58184@bbn.BBN.COM> <5870@titcce.cc.titech.ac.jp> <58227@bbn.BBN.COM> <5894@titcce.cc.titech.ac.jp> Sender: news@bbn.com Reply-To: lkaplan@BBN.COM (Larry Kaplan) Organization: Bolt Beranek and Newman Inc., Cambridge MA Lines: 135 To start with, I think we've about beat this topic into the ground. We all have our preferences and noone is going to change them easily. :-) Anyway, I'll address some unclear points and provide a little more justification than I have before. I'm going to try and avoid making deprecating (or insulting) remarks as some have done on this topic. In article <5894@titcce.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: > (stuff about my proposed swap space monitor daemon) > >Oops! A program? Are you joking? What if the program itself run short >of pages? Our kernel (and others, I am sure) allows sufficiently privileged users (like root) to ask that some of their memory be wired (not paged). This facility could certainly be used for this daemon. I did say this in the original posting but some may have overlooked this. >>It could also be part of the kernel. > >More reasonable proposal. It can be a dirty and unreliable workaround. (Another obnoxious comment.) >I showed several justification why simple fork is not appropriate >for fork-exec. But the debate is COW fork vs vfork. >On the other hand, you showed no justification not to use vfork. >You only showed it is inelegant not to use vfork and insists on using >fork. > >>There are most certainly situations where vfork() is not appropriate. > >Have you shown any justification to claim so? Or, is it merely your >desire? Let me attack this one now. Let me start by saying that the problems we have with a regular vfork are probably fairly machine specific. However, some of this reasoning may soon apply to other machines if not already. Our machine is a non-uniform memory architecture multiprocessor. This means that each node has memory local to it. If you were to use vfork when forking onto another processor, the child would be executing out of remote memory. While caches help alleviate the performance penalty, making things local is a much better idea. By using COW fork, we always set up a local text (initially empty) and stack segment, and even support COR (copy on reference) for remote memory marked INHERIT_COPY. Given this reason, in addition to the other problems with vfork (such as the broken semantics people seem to agree upon), made it fairly clear that we should get rid of it. To make vfork set up local text anyway would eliminate most of performance advantage it had. Note that even for forks onto the same processor, the text segment manipulations aren't that expensive since in this case you do share the segment and only have to set up page table entries on demand (or you could preallocate the page tables for the text). >>>>As a side note, on the large systems I work on, we don't do preallocation >>>>and have never run out of paging (swap) space. This is not to say that >>>>we never will, but typical systems have on the order of at least 10 times as >>>>much disk storage as main memory. In some cases, as much as 100 times more. > >>>You have 100 times more swap space because you think it may be filled, >>>don't you. > >>NO. These numbers come from simply looking at the systems that people have >>running. These numbers are true for most all computers in general. > >You want to say most all computers in general have 10 to 100 times more >swap space than real memory? That is simply incorrect. > >Very few system have 10 times more swap space. Please, please, please, I said disk space to real memory, not swap space. Part of the point was that Mach is willing to page to all the disks whether they are true swap partitions or regular UNIX filesystems. So the ratio is very relevant to Mach based systems. >>>As you may know, programs manipulating large arrays, if written properly, >>>can use very large virtual space with little real memory without >>>thrashing. That is why some of your system are configured 100 times >>>more swap space, isn't it? > >>You are attempting to justify the reasons for the way I have my systems >>configured, when in reality this is the way just about everybody's systems are >>configured. > >The thread of the vfork discussion begins because I said your system without >vfork is broken. > >Moreover, it is you, who brought the configuration of your system into >the discussion. > >So, why can't I refer to your system? You can most certainly refer to my system. But you were suggesting the reasons for why my system looks the way it does and those reasons are simply NOT true. >>The point about sparse matrix programs may be true to some extent. > >Sparse matrix? I am not refering to such a thing. > >>Even just reads from parts of the matrix wouldn't require >>swap space allocation since the pages aren't dirtied. This sounds quite >>likely. > >You can do so with mmap specifying readonly option. A cleaver implementation >won't allocate extra swap space. > >BUT, such a thing has nothing to do with fork nor vfork. But it does have to do with preallocation and COW fork vs vfork. Again any mmap calls require application code changes that the non-preallocating scheme doesn't need. >>On inspection, it appears that one of our big systems currently in the field >>has 1 gigabyte of physical memory and only about 6.4 gigabytes of disk storage. >>They run sparse matrix problems. They run lots of scientific codes. They have >>yet to have any problems with swap space. > >You might have been lucky, or, you might have just overlooked the >problem. Lucky? Maybe. Overlooked? Not possible. While this has been a very interesting discussion, and most of the necessary primitives exist in our kernel to support these fancy emergency schemes, none of higher level things have been implemented yet. If they ran out of swap space, they (and we) would know. #include _______________________________________________________________________________ ____ \ / ____ Laurence S. Kaplan | \ 0 / | BBN Advanced Computers lkaplan@bbn.com \____|||____/ 10 Fawcett St. (617) 873-2431 /__/ | \__\ Cambridge, MA 02238