Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!apple!agate!shelby!portia.stanford.edu!elaine23.stanford.edu!dhinds From: dhinds@elaine23.stanford.edu (David Hinds) Newsgroups: comp.sys.sgi Subject: Re: Swap questions Message-ID: <1990Dec1.021608.24498@portia.Stanford.EDU> Date: 1 Dec 90 02:16:08 GMT References: <9011150232.AA00704@koko.pdi.com> <1990Nov28.163415.14317@odin.corp.sgi.com> <1539@contex.UUCP> Sender: news@portia.Stanford.EDU Organization: Stanford University - AIR Lines: 27 In article <1539@contex.UUCP> james@contex.UUCP (James McQueston) writes: > >Example: your server is used to run a simulation that takes hours or days to >compute, and you have tuned the size of your finite-element mesh to just >barely fit within the capabilities of that machine. N hours later, someone >else innocently runs some unimportant program on the server and causes page >deadlock. The O.S. blindly decides which process to kill and ... pow! Chance >determines that the simulation gets killed and you lose N hours of work. >Too bad that the other user was just checking his mail. We had a bad thing happen yesterday that I think was a result of this problem. My advisor has written a graphics program for manipulating the results of protein molecular dynamics calculations, that reads entire dynamics trajectories into memory. It is written in Fortran, and has huge static zero-initialized data areas - it takes about 48MB of virtual memory. We have 32MB of main memory and 48MB of swap space presently. Yesterday, someone started up this program and started reading in an MD dataset, and walked away. When she came back, the machine was apparently deceased. The mouse cursor could still move around the screen, but the buttons and console keyboard were useless. We couldn't get any response from the system over the network. We had to power down to reset things, and I lost a simulation that had logged about 120 hours of CPU time. I can only guess that when the virtual memory limit was reached, something important was killed that crippled the system. This was under 3.3.1, by the way. -David Hinds dhinds@cb-iris.stanford.edu