Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!cis.ohio-state.edu!karl_kleinpaste From: karl_kleinpaste@cis.ohio-state.edu Newsgroups: comp.sys.pyramid Subject: help with mbuf leak problem? Message-ID: Date: 15 Sep 90 14:33:41 GMT Sender: news@tut.cis.ohio-state.edu Organization: Ohio State Computer Science Lines: 84 Pyramid 98xe, OSx4.4c, nfsd 8, biod 8. I've developed a nasty problem with one of my Pyrs in the last 16 hours or so. It has developed a serious problem with mbuf lossage. Here's a netstat -m output just before his last reboot, about 10 minutes ago: 2003/2032 mbufs in use: 1877 mbufs allocated to data 12 mbufs allocated to packet headers 109 mbufs allocated to routing table entries 3 mbufs allocated to socket names and addresses 2 mbufs allocated to interface addresses 128/128 mapped pages in use 510 Kbytes allocated to network (99% in use) 1 requests for memory denied Note excessive data mbuf allocation, and 99% utilization. Consider the same thing from his twin, in the next cabinet, looking quite normal and running for days: 86/288 mbufs in use: 3 mbufs allocated to data 4 mbufs allocated to packet headers 75 mbufs allocated to routing table entries 2 mbufs allocated to socket names and addresses 2 mbufs allocated to interface addresses 28/96 mapped pages in use 228 Kbytes allocated to network (29% in use) 0 requests for memory denied This leakage started happening sometime around 5pm or 6pm last evening. I have had to reboot almost hourly just to keep the @#$% machine alive. I've experimented with several things, trying to find the cause. Killing off assorted network daemons didn't help; sendmail, nntp, inetd as a whole, routed, pcnfsd were all killed, and yet the data mbuf allocation keeps ratcheting upward. I tried rebooting with 16 nfsd/biod but this was no help either. Killing off all nfsd/biod and the portmapper didn't help. Renicing nfsd and/or biod didn't help. As near as I an see, nothing running on the Pyr itself is the cause of this. "etherfind -r -n src victim-pyr or dst victim-pyr" run from a nearby SunOS4.1 Sun3 shows a great deal of NFS traffic, of this form: UDP from another-pyr.1023 to victim-pyr.2049 128 bytes RPC Call prog 200000 proc 1 V1 [93dc7] UDP from victim-pyr.2049 to another-pyr.1023 104 bytes RPC Reply [93dc7] AUTH_NULL Success UDP from another-pyr.1023 to victim-pyr.2049 172 bytes RPC Call prog 200000 proc 9 V1 [93dc8] UDP from victim-pyr.2049 to another-pyr.1023 36 bytes RPC Reply [93dc8] AUTH_NULL Success UDP from another-pyr.1023 to victim-pyr.2049 172 bytes RPC Call prog 200000 proc 9 V1 [93dc9] UDP from victim-pyr.2049 to another-pyr.1023 36 bytes RPC Reply [93dc9] AUTH_NULL Success UDP from another-pyr.1023 to victim-pyr.2049 128 bytes RPC Call prog 200000 proc 1 V1 [93dca] UDP from victim-pyr.2049 to another-pyr.1023 104 bytes RPC Reply [93dca] AUTH_NULL Success But not all of this traffic is coming from another-pyr -- assorted Pyrs, Suns, and the occasional HP show up. I'm also getting messages like NFS server write failed: (err=13, dev=0xffa610a4, ino=0xffa69bd0). on the console occasionally. Errno 13 is EACCES. ??? The only anomalous thing about this Pyr's configuration is that it's the departmental /usr/spool/mail NFS server. But that's been the case for a couple of years now, nothing new or unusual about that. As I said, I'm rebooting roughly hourly at this point to keep it alive. It seems to perform admirably right up until the end, when the 2032/2032 mbuf condition hits. It reboots in 10 minutes and is fine again for the next hour, while the mbuf count goes up. Clues, anyone? I can't think of anything that would have been started at 5pm on a Friday evening which might cause this sort of thing. What sort of activity on the Pyr or elsewhere on my network should I be looking for? --karl