Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!spool.mu.edu!agate!riacs!pioneer.arc.nasa.gov!lamaster From: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.unix.wizards Subject: Re: Mysterious Sun-4 bug Keywords: Sun-4/490, Sybase, Process Hanging in "D" state Message-ID: <1991Jun27.162631.25647@riacs.edu> Date: 27 Jun 91 16:26:31 GMT References: <1991Jun25.174729.11481@StarConn.com> <338@devnull.mpd.tandem.com> <1991Jun26.223311.15591@riacs.edu> Sender: news@riacs.edu Reply-To: lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) Organization: RIACS, NASA Ames Research Center Lines: 53 I previously wrote: >The bug has appeared in 4.1, 4.1 + various patches (almost 4.1.1), 4.1.1, >with and without DBE installed, with and without FDDI (ie, with NFS >traffic over ethernet). The same symptom has appeared in all cases: >a process which is usually doing NFS I/O will hang in "D" state. The >offending process cannot be killed, and eventually other processes >start hanging as well. During this period, Sybase activity >will have been very heavy. The Sybase datasever process itself, however, >never hangs (note: Sybase is set up so that its I/O is local, *and* >Sybase is using its own raw partitions). Even though Sybase itself >never hangs, *If Sybase asych. I/O is turned OFF, >the problem rarely if ever appears.* 1) We are not running with /tmp in swap with tmpfs. However, I understand that this can cause a similar sounding problem, which may be related. It could be a bug somewhere in the allocation of swap space. 2) I should have made it clear that the Sybase raw partitions are local to the machine with Sybase, and are not doing NFS on the Database files. Only user-type files are mounted off of the fileserver using NFS. Also, lockd and statd are not running. I believe that there is no need for them to be running, since Sybase is not reading/writing over NFS, and is not complaining about lock requests failing. 3) We had another hang yesterday afternoon. The processes which hung this time looked like the following: F UID PID PPID CP PRI NI SZ RSS WCHAN STAT TT TIME COMMAND 200080001002 9562 9542 0 -1 0149376 0 kernelma DW pa 0:00 model 200080011002 9529 4227 0 -1 0149376 72 kernelma D pb 0:00 model A pstat -Ts showed the following: [149] pstat -Ts >pstat: number of files is preposterous (14019) >1470/1470 inodes >454/4090 processes >460952/781032 swap > We have a lot of swap space allocated, to run some of these big jobs. -- Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster NASA Ames Research Center Internet: lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 With Good Mailer: lamaster@george.arc.nasa.gov Phone: 415/604-1056 #include