Path: utzoo!utgpu!watserv1!watmath!att!ima!bbn.com!lkaplan
From: lkaplan@bbn.com (Larry Kaplan)
Newsgroups: comp.arch
Subject: Re: fork and preallocation (was Re: Paging page tables)
Message-ID: <58227@bbn.BBN.COM>
Date: 16 Jul 90 18:07:36 GMT
References: <920@dgis.dtic.dla.mil> <5830@titcce.cc.titech.ac.jp> <5DL4SPD@xds13.ferranti.com> <5855@titcce.cc.titech.ac.jp> <58184@bbn.BBN.COM> <5870@titcce.cc.titech.ac.jp>
Sender: news@bbn.com
Reply-To: lkaplan@BBN.COM (Larry Kaplan)
Organization: Bolt Beranek and Newman Inc., Cambridge MA
Lines: 129

In article <5870@titcce.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>In article <58184@bbn.BBN.COM> lkaplan@BBN.COM
>	(Larry Kaplan) writes:
>
>>While deadlock doesn't occur, random processes can still
>>die (of their own doing) when some malloc call fails to be able to reserve 
>>swap space.
>
>Failing malloc is very different from being killed at random.
>
>The situation is well under control. Important processes can be programmed
>to try mallocing several times with exponential back-off. Even if it dies,
>it can cleanup its environment.
>

This requires significant modification of programs.  My proposals have not 
required any change to applications, only additional system code.  A general
solution does not require old applications to be rewritten.

>>Next, there are actually ways to handle the deadlock.
>
>Yes, it is always possible to resolve deadlock by human intervention.

Don't put words in my mouth.  I say below that you can write a daemon
(which is a program) to do this.  If you want to let a human do it, you can,
but that is suboptimal.

>>Even
>>if the deadlock actually occurred, you could suspend all the processes waiting
>>for swap space, and then mount some reserve filesystem.
>
>It is very strange that you have reserved filesystem available for
>swapping. You should have already allocated such a free space in advance
>as swap area.

This is only a convenience.  Depending where you implemented this code, you 
could simply check a high water paging mark as mentioned by others.  Holding 
some portion of your swap space in reserve is just another way to give you the 
opportunity to play these games when trouble starts.  Using a high water mark
can work just as well.

>>Some care would then
>>be necessary to let the important jobs finish.  It may be necessary to continue
>>jobs selectively instead of all at once, to prevent a repeat of the deadlock.
>>Even if you need some more memory to get the mounting done, you could kill 
>>some non-critical system daemon that could be started later (like lpd or 
>>something).  Later on, you could decide to restart the daemon killed earlier, 
>>and/or unmount no longer used filesystems.  Eventually, you could return to
>>normal operation.
>
>Who take care of all these things? Are you proposing to attach knowledgable
>person all the time? A person who can understand what is deadlock seems
>to be very uncommon even in this newsgroup.

Read the posting.  It says daemon.  This means a program.  It could also be
part of the kernel.  (How about not deriding the readership while your at it.)

>>This is a little complicated but certainly doable and allows you to not
>>reserve swap space on memory allocation and to use a true COW fork().
>
>But it dose not worth doing so. Use vfork.

This is your opinion (if I understand the sentence).  Making statements like 
this does not address the other benefits that many people seem to like 
about COW fork.

>>People may complain that this is not a truly
>>general solution, and I would agree.
>
>Vfork is the true solution.

True?? By whose standards.  This is what we are debating.  Making statements
like this without justification is worthless.  When I say its not general, I 
mean that there are situations were this may not be appropriate.  There are 
most certainly situations where vfork() is not appropriate.  So much for being 
a true solution.

>>As a side note, on the large systems I work on, we don't do preallocation
>>and have never run out of paging (swap) space.  This is not to say that
>>we never will, but typical systems have on the order of at least 10 times as
>>much disk storage as main memory.  In some cases, as much as 100 times more.
>
>You have 100 times more swap space because you think it may be filled,
>don't you.

NO.  These numbers come from simply looking at the systems that people have 
running.  These numbers are true for most all computers in general.  It may be 
that most people can't page on all their filesystems, but thats not what I 
said.  

>>I claim it is hard to fill that much disk
>>space with paging and swapping traffic and still have a usable system.  You'll
>>probably be thrashing to death long before that.
>
>As you may know, programs manipulating large arrays, if written properly,
>can use very large virtual space with little real memory without
>thrashing. That is why some of your system are configured 100 times
>more swap space, isn't it?

You are attempting to justify the reasons for the way I have my systems 
configured, when in reality this is the way just about everybody's systems are 
configured.  

The point about sparse matrix programs may be true to some extent.
These programs may or may not represent a significant portion of a machine's
workload and therefore have some bearing on the techniques selected.
Depending on the reference patterns of the programs though, if you really don't
reference parts of the matrix at all, then pre-allocation of swap space is
going to prevent the program from running when it would run fine without 
preallocation.  Even just reads from parts of the matrix wouldn't require
swap space allocation since the pages aren't dirtied.  This sounds quite
likely.

On inspection, it appears that one of our big systems currently in the field
has 1 gigabyte of physical memory and only about 6.4 gigabytes of disk storage.
They run sparse matrix problems.  They run lots of scientific codes.  They have
yet to have any problems with swap space.  

It may be that the 100 number is a little exaggerated.  Having disk storage 
being 10 times the amount of physical memory in large machines seems to be 
more the rule.  Disk servers, however, move more towards the 100 mark.

  
#include <std_disclaimer>
_______________________________________________________________________________
				 ____ \ / ____
Laurence S. Kaplan		|    \ 0 /    |		BBN Advanced Computers
lkaplan@bbn.com			 \____|||____/		10 Fawcett St.
(617) 873-2431			  /__/ | \__\		Cambridge, MA  02238