Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!brutus.cs.uiuc.edu!jarthur!elroy.jpl.nasa.gov!ames!sgi!brendan@illyria.wpd.sgi.com
From: brendan@illyria.wpd.sgi.com (Brendan Eich)
Newsgroups: comp.sys.sgi
Subject: Re: Intermittent Login Problems
Message-ID: <52084@sgi.sgi.com>
Date: 28 Feb 90 08:34:41 GMT
References: <52878@bu.edu.bu.edu> <1990Feb27.171242.7976@hellgate.utah.edu>
Sender: brendan@illyria.wpd.sgi.com
Organization: Silicon Graphics, Inc., Mountain View, CA
Lines: 57

In article <1990Feb27.171242.7976@hellgate.utah.edu>, brian@cs.utah.edu (Brian Sturgill) writes:
> >  [. . .] The behavior seems random.   The only unusual message that I could 
> >  find in SYSLOG was:
> > 
> >  Feb 21 14:41:22 panda grcond[10521]: In limbo
> >  Feb 21 14:42:07 panda grcond[10521]: Tried and failed 3 times to download 
> >  graphics subsystem
> > 
> >  I asked our usual service person and the SGI hotline people and nobody had 
> >  seen this message before.
> 
> The main idea I get is that it is odd that SGI does not know about this
> problem.  ALL of our 4D/20's, and our 240GTX have this problem.

Do you get the "Tried and failed 3 times to download graphics subsystem"
message on all of your 4D/20's, or only some?  On your 240GTX?  The reason
I ask is because very different versions of the grcond program are shipped
for different models, according to their graphics hardware, and >only the
240GTX version contains the "Tried and failed" message<.

Has someone inadvertently copied the 240GTX's /etc/gl/grcond to a 4D/20?
Or does the message you quote in fact occur only on your 240GTX?

> Looking at our SYSLOGs shows that this occurs 4.51 times per machine per day.
> Often just before the limbo message we get:
> 
> 	... grcond[5015]: Child process /etc/gl/pandora exited with status 0

This SYSLOG entry was intended to be informational (LOG_INFO) only, and does
not necessarily indicate a problem.  Logging successful exit status does not
seem useful; perhaps this unduly alarming message should be eliminated.

> I do not know if the exact same mechanism is responsible, but we also
> had the graphics servers crash so frequently (leaving a very large /core) that
> I installed /core as a symlink to /dev/null.

The graphics server meaning /bin/news_server?  Was there any SYSLOG message
from news_server (rather than from grcond) at the time of the coredump?

> It seems odd that it is not occuring regularly at SGI on their machines.
> (Perhaps they have not upgraded to 3.2 yet?)

We're running 3.2, 3.2.1, 3.2.2, and what will become 3.3 in engineering,
on hundreds of Iris 4Ds.  Generally, engineers install and run a release
long before any customers see it.

The only troubles I've had with news_server, grcond, and microcode have
been during development, when I used mismatched versions.  I've heard, but
not seen, of GT/GTX microcode problems that occasionally result in SYSLOG
messages and graphics crashes.  I've had no such problems with my 4D/20 in
more than a year; I've been running 3.2 for about six months.

> Brian

Brendan Eich
Silicon Graphics, Inc.
brendan@sgi.com