Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!uunet!mcsun!ukc!cam-eng!!tpm From: tpm@eng.cam.ac.uk (tim marsland) Newsgroups: comp.sys.hp Subject: why do cluster clients panic after cluster servers die Keywords: clusters nfs Message-ID: <7752@rasp.eng.cam.ac.uk> Date: 8 May 90 17:45:54 GMT Sender: tpm@eng.cam.ac.uk Reply-To: tpm@eng.cam.ac.uk (tim marsland) Organization: Cambridge University Engineering Department, UK Lines: 21 We just had an NFS fileserver crash bring down an entire cluster because (it seems) the cluster server got locked up waiting for the (hard mounted) server to come back up, and the cluster clients panicked after 30 seconds of waiting. We're going to investigate the circumstances some more, and try out ways around this particular problem, but it reminded me of the question I'd been meaning to ask an HP wizard for a while which is: Why _does_ a cluster client invoke panic() when its cluster server stops responding? Why does it simply not sleep-and-retry? If it's important to sync everything at cluster server boot time, then why not get the cluster server to reboot its clients whilst it's coming up? Apologies in advance if this is a frequently asked question, or is already somewhere in the fine manual. Just curious. tim marsland, information engineering division, cambridge university engineering dept.