Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!decwrl!sgi!shinobu!fido.asd.sgi.com!moose.asd.sgi.com!jwag From: jwag@moose.asd.sgi.com (Chris Wagner) Newsgroups: comp.sys.sgi Subject: Re: 4D-200 series hangs frequently Message-ID: <1991Jun21.022810.7255@fido.asd.sgi.com> Date: 21 Jun 91 02:28:10 GMT References: <84172@bu.edu> Sender: news@fido.asd.sgi.com (Usenet News Admin) Organization: Silicon Graphics, Research & Development Lines: 62 In article <84172@bu.edu>, jdh@pub.bu.edu (Jason Heirtzler) writes: |> We have a problem with our 4D-220 and 4D-240 series machines |> hanging a lot. The symptoms vary from just the window system |> locking up (you can still rlogin in) to sometimes the whole |> machine will hang -- and with five 4D-200 series machines, we |> probably average one machine hung each day. Sometimes, when |> the machine(s) keep running, various combinations of running |> /etc/gl/restart_gl and using the window system "hot key" sequence |> (F12-/-whatever) will return the console to normal. But then, |> the other times, only /etc/reboot (or pressing "reset") will do. |> |> After numerous calls to the hotline, there's been no improvment, |> and I'm sure I've personally installed every "dot dot" release |> since the early 3.1 days. Everything is running release 3.3.2 |> at the moment, and I'm waiting for my latest call to be returned |> with another "Gee.. dunno.. have you tried 3.3.3?" |> -- The problems you present are most likely derived from a few different issues. It is usually important, when trying to improve things to start classifying the 'hangs'. For example, if the graphics wedges, and the rest of the system (network, etc) seems ok, then look in /usr/adm/SYSLOG for any messages from the graphics hdw, and do some ps listings to see if there is a particular process that is usually present, that is doing graphics... As for the entire machine hang, again, trying to classify the problems can help to zero in on the problem so: 1) any nfs hard mounts??? 2) any suspicious logs in SYSLOG (like disk errors??) 3) can you ping it 4) are the front panel LED digits blinking??? 5) can you rsh in (not rlogin necessarily) There are also some statistics that may help - like running netstat -m to determine network memory usage, and sar to determine system load sometimes these statistics can help characterize what your doing thats slightrly different than others and therefore bringing out some bug (software or hardware) I would also suggest running the ecc(1) command to be sure that your memory is ok. Listings of #users (how are they logged in - telnet, rlogin, ftp??) are also useful This data should be able to help the hotline - keep bugging them!!! (and by the way, have you tried 3.3.3? :-) ---- Chris Wagner (jwag@sgi.com)