Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!mit-eddie!genrad!decvax!ucbvax!UTHSCSA.BITNET!BITNET From: BITNET@UTHSCSA.BITNET.UUCP Newsgroups: mod.computers.vax Subject: Thanks and system hangs. Message-ID: <8702272240.AA26096@ucbvax.Berkeley.EDU> Date: Fri, 27-Feb-87 17:40:49 EST Article-I.D.: ucbvax.8702272240.AA26096 Posted: Fri Feb 27 17:40:49 1987 Date-Received: Sun, 1-Mar-87 12:11:21 EST Sender: daemon@ucbvax.BERKELEY.EDU Distribution: world Organization: The ARPA Internet Lines: 51 Approved: info-vax@sri-kl.arpa I have been a computer programmer for a little over two years and I have subscribed to info-vax for a little over a year now. I was recently promoted to a position in our systems group. I would like to say thank you to all the people out there (ESP --Jerry) for all the great info. I can honestly say that if I didn't have the knowledge acquired from reading info-vax, I probably would not have gotten my current position. Now for my question. We are experiencing a strange problem on one of the Vaxen in our cluster (consisting of 2 11/750s, 1 11/785, 1 8650). We recently increased the number of logins allowed from 64 (default) to 120 on our 8650. Shortly there after the system would start to hang with about 70 or 80 users. The system would slow down and processes would be placed in a RWSWP state. These processes would hang completely. eventually the system would completely hang and no one could do anything, including the console. We did a ^P and @CRASH at the console. I am still learning how to use SDA and analyze a crash dump but here is what I found. If I did a 'SHOW SUM/IMAGE' in SDA, some of the processes would have 'No Image Name Available', this is not the same as process not currently executing an image (DCL). These processes would just not have any image name. Most of the processes in the RWSWP state were running ALL-IN-ONE, A DEC office automation package. When I set 'RMS=ALL' in SDA and did 'SHOW PROC/RMS/INDEX=' for the processes in the RWSWP state they were all accessing DUA6: we use DUA6: for the ALL-IN-1 shared areas and our secondary page and swap files. The bitmask 'BKPBITS' had the following bits set for the majority of the processes in the RWSWP state: BUSY, ACCESSED, RMS_STALL, STALL_LOCK The last time the system started doing this I was able to do a 'SHOW MEM' from DCL and our swap file was more that 90% used, but the page file had plenty of free space. The best we could come up with is that our Swap files don't have enough space. We increased our secondary Swap file from 50,000 blocks to 100,000 blocks so far so good. Could it be something else? 1. What do the BKPBITS bits mean are these values normal? 2. What is RWSWP (my guess is Resource Wait SWaPped) 3. Where can I learn more about how to interpret what I get from SDA? 4. If this was the problem, how can I determine how much space per process should be allocated in the page and swap files. And why are so many processes being swapped out? Mark Moore (Green Assistant-System-Manager who will gladly accept all the info he can get) MOORE@UTHSCSA.BITNET P.S. I was going to call TSC, but no one answers. I think they are having some severe weather problems.