Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!steinmetz!vdsvax!barnett
From: barnett@vdsvax.steinmetz.UUCP (Bruce G Barnett)
Newsgroups: comp.unix.wizards
Subject: Re: Recovery from swap failure
Message-ID: <2461@vdsvax.steinmetz.UUCP>
Date: Fri, 4-Sep-87 06:30:08 EDT
Article-I.D.: vdsvax.2461
Posted: Fri Sep  4 06:30:08 1987
Date-Received: Sat, 5-Sep-87 20:00:26 EDT
References: <2433@vdsvax.steinmetz.UUCP> <691@spar.SPAR.SLB.COM>
Reply-To: barnett@steinmetz.UUCP (Bruce G Barnett)
Organization: General Electric CRD, Schenectady, NY
Lines: 41

Re: my recovery from swap failure.

I have enjoyed the few suggestions I have gotten. But I believe
that there is no solution with the situation I proposed.

Remember - this is with a vendor's simulation program, so I can't
hack the sources. ( I will complain to the vendor about check-pointing).

If I could, however, there is still a problem of recovery from a swap failure.

To wit:
	Swap partition = 100 Meg
	Job A runs for 20 hours - allocates (say) 80 Meg
	. . .
	Job B (but same program as A) starts up, allocates 19 Meg
	. . .
	Job A needs 2 Meg more virtual memory - fails - aborts - riots start

Without check-pointing, it does no good for Job A to suspend. Job B
	will continue, suspend, and then Job C will start, suspend, etc.

	Perhaps the software could detect a malloc failure, and given
some parameter specified by the user, suspend or abort the job ( small
jobs abort, big jobs suspend - or oldest job suspends, newest job
aborts).

As it turns out - we have a viable solution - multiple simulation machines!
I will most likely implement:
	All simulaton jobs go into a queue
	Big jobs going to the large machine
	Small jobs going to the big system if idle
	Otherwise, the small system(s).

Someone here has MDQS, which I will look into. Any (additional) ideas
or suggestions will be appreciated.


-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett