Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!mcvax!ukc!acorn!john
From: john@acorn.co.uk (John Bowler)
Newsgroups: comp.windows.x
Subject: Re: rgb database corruption
Summary: rgb database corruption - a possible explanation
Message-ID: <796@acorn.co.uk>
Date: 16 Jun 89 12:41:45 GMT
References: <8906142323.AA04774@expire.lcs.mit.edu>
Organization: Acorn Computers Limited, Cambridge, UK
Lines: 62

In article <8906142323.AA04774@expire.lcs.mit.edu>, rws@EXPO.LCS.MIT.EDU writes:
> Here's an unofficial diff fragment to server/os/4.2bsd/osinit.c that might or
> might not cause this problem to disappear.  You can try it if you are being
> pestered by this problem.  You should probably ignore it if you aren't.
> Your mileage may vary.
> 
> [Patch - most omitted]
> ! 	    if (!(err = fopen (fname, "a+")))
> ! 		err = fopen ("/dev/null", "w");
> ! 	    if (err && (fileno(err) != 2)) {
> ! 		dup2 (fileno (err), 2);
> ! 		fclose (err);
> ! 	    }

This fixes one obvious problem, but this problem (connection of stderr to
a file descriptor other than fd 2) is not the only possible cause of rgb
database corruption.  I have been running with appropriately fixed R2 code
and still observed these symptoms.  For my code to fail a subsequent
open of /dev/null must also fail - I come to the conclusion that this
must be happening (very rarely) on the systems I use - and I notice that
the above code will still go wrong if the fopen ("/dev/null", "w") fails.

For the database to be corrupted (given the normal installation mechanism)
the server must be running as root and (at least) the open of /usr/adm/X?msgs
or the subsequent dup2 must fail.  Assuming the directory /usr/adm exists
the only likely reason for failure on a bsd, or bsd-tahoe, system, is if
the kernel file table fills up - which will tend to mean that all the opens
fail together.

I reckon the server should check both ``err'' and fileno(stderr) and, if either is
wrong, it should give up.  Of course, I'm biased - Acorns customers received X
binaries on 50MByte discs (so no possibility of fitting the source on).  If they
manage to corrupt their rgb database they can do nothing about it short of a
going to the level 0 backup which they did, of course, make as soon as they got the
system...

The following code fragment (**NOT guaranteed - bsd specific - caveat emptor**)
should work.  A better fix, for those with access to the ndbm package, is to
hack oscolor.c and osinit.c to open the database RD_ONLY (if only it was
possible to cause dbm to open the database read-only - but even making the
files read-only doesn't help under bsd; the super user can always write to them).

    /*
     * This is done in this nasty way to ensure that the correct file descriptors end
     * up connected to the correct place.
     */
    /* Zap stdin and stdout */
    if (freopen("/dev/null", "r", stdin) == NULL)
	_exit(2);
    if (freopen("/dev/null", "w", stdout) == NULL)
	_exit(2);

    /* See if stderr is a reasonable stream, if it is assume it is ok */
    if (fcntl(2, F_GETFD, 0) == (-1)) {
	char fname[MAXPATHLEN+1];

	sprintf (fname, ADMPATH, display);
	if (freopen(fname, "a+", stderr) == NULL &&
	    freopen("/dev/tty", "a+", stderr) == NULL &&
	    freopen("/dev/console", "a+", stderr) == NULL ||
	    fileno(stderr) != 2)	/* Could output error message here */
	    _exit(3);