Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!julius.cs.uiuc.edu!apple!agate!shelby!msi.umn.edu!cs.umn.edu!poincare.geom.umn.edu!slevy From: slevy@poincare.geom.umn.edu (Stuart Levy) Newsgroups: comp.sys.sgi Subject: Re: problems with automount Message-ID: <1991Jan24.025719.8968@cs.umn.edu> Date: 24 Jan 91 02:57:19 GMT References: <9101231344.AA15399@smithkline.com> Sender: news@cs.umn.edu (News administrator) Organization: Geometry Group, University of Minnesota Lines: 35 Nntp-Posting-Host: poincare.geom.umn.edu In article <9101231344.AA15399@smithkline.com> dixons%phvax.dnet@SMITHKLINE.COM writes: >Has anyone been using automount with Irix 3.3.1 on multiprocessor machines? >... >First observed problems usually are with df... >Then gradually (over about an hour or so) other disk related things >(like ls and pwd) begin to hang. When a process hangs in this case, you can't >interrupt it or kill it although you can ^Z it and leave it in the background. >Eventually you can no longer log in to the system (although compute bound jobs >seem to continue to run) and the only thing to do is push the reset button.... >If anyone else is seeing similar problems, or is running automount fine >on MP machines, I would like to hear about it. > >Scott Dixon (dixons@smithkline.com) We're running 3.3.1 on an MP machine using NFS with *NO* automounting nor lockd/statd, but we do have occasional problems similar to yours, I think. Some disk-related things will hang while others keep working for a while, as gradually more and more shell windows wedge; existing programs (clock, NeWS interaction) seem to keep running; it's impossible to log in (network daemons respond but logins hang before reaching a shell prompt); inetd-started daemons still answer for a while, then TCP connections cease to open (maybe when the listen() queues fill?). No "NFS server not responding" messages appear. I keep thinking there's some important inode ("/"?) getting locked, but it's hard to tell. It happens fairly quickly -- "a while" on our system tends to be ~5 minutes rather than an hour. SGI support was sympathetic, but it was hard to pin anything down. Since talking to them we've started running a network daemon that lets you cause a panic remotely, i.e. get a crash dump. (Anyone wanting this daemon, let me know.) This syndrome has recurred once since then; I haven't yet shown SGI the resulting dump, and can't see how to get much out of it with dbx -k. Stuart Levy, Geometry Group, University of Minnesota slevy@geom.umn.edu