Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!spool.mu.edu!agate!riacs!pioneer.arc.nasa.gov!lamaster From: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.unix.wizards Subject: Re: Mysterious Sun-4 bug Keywords: Sun-4/490, Sybase, Process Hanging in "D" state Message-ID: <1991Jun26.223311.15591@riacs.edu> Date: 26 Jun 91 22:33:11 GMT References: <1991Jun25.174729.11481@StarConn.com> <338@devnull.mpd.tandem.com> Sender: news@riacs.edu Reply-To: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) Organization: RIACS, NASA Ames Research Center Lines: 44 We are experiencing a peculiar bug which has appeared from time to time on our Sun-4/490 server. This system is very heavily loaded, mainly because it is running Sybase. It recently had an official clean 4.1.1 release installed, with DBE 1.1, and selected patches added. The system has a Sun VME FDDI board, and FDDI 1.1 is installed. There is heavy NFS traffic to another Sun server via FDDI (at the moment - Ethernet has also been used). The bug has appeared in 4.1, 4.1 + various patches (almost 4.1.1), 4.1.1, with and without DBE installed, with and without FDDI (ie, with NFS traffic over ethernet). The same symptom has appeared in all cases: a process which is usually doing NFS I/O will hang in "D" state. The offending process cannot be killed, and eventually other processes start hanging as well. During this period, Sybase activity will have been very heavy. The Sybase datasever process itself, however, never hangs (note: Sybase is set up so that its I/O is local, *and* Sybase is using its own raw partitions). Even though Sybase itself never hangs, *If Sybase asych. I/O is turned OFF, the problem rarely if ever appears.* So, to cause the hang, you seem to need: Sybase, with asynch I/O on. A heavy Sybase load. Another process doing NFS reads/writes... Oh yes. It seems to take a while to get in this predicament. After the inevitable reboot, the system is usually OK for a while. Has anyone else experienced this problem? It could be an NFS problem, an asynch I/O problem, a load dependent kernel problem, ... Any help would be much appreciated. -- Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster NASA Ames Research Center Internet: lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 With Good Mailer: lamaster@george.arc.nasa.gov Phone: 415/604-1056 #include