Path: utzoo!attcan!uunet!lll-winken!ames!apple!vsi1!wyse!mips!jay From: jay@mips.COM (Jay McCauley) Newsgroups: comp.unix.questions Subject: Re: Insufficient Resource Error on msgsnd Call Keywords: msgsnd, msgrcv, Ingres, UNIX, System V, message queues, Sun, SunOS4.0 Message-ID: <20284@winchester.mips.COM> Date: 24 May 89 19:34:20 GMT References: <1023@dinl.mmc.UUCP> Reply-To: jay@mips.COM (Jay McCauley) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 34 In article <1023@dinl.mmc.UUCP> noren@dinl.UUCP (Chuck Noren) writes: >We have been developing a application that uses System V message queues >(perhaps thats the first mistake :-)) for interprocess communication. >Everything has worked fine until we really wanted to stress test the >application by sending it hundereds of messages at once. The application >chugs away nicely until it hangs. What is probably happening is you have hit the under documented "feature" that System V message queues all share a very small temporary buffer space for messages in transit. In some systems I've used, the pool was 8 Kb, apparently the default size in System V Release 2. It is very easy to create deadlocks with this implementation. I'm not sure how the pieces of Ingress communicate with each other, but if they too use the msg facilities, bingo, a deadlock. There is not a real good way out of this. One way is to use non-blocking writes, and pause a bit if you get denied. The implementation where I had first encountered this has a tricky backoff protocol attempt to prevent the deadlock. It worked well in practice, but is not theoretically perfect. Increasing the buffer pool is, typically, just a matter of a binary reconfig of the kernel. This can reduce the frequency of the problem, but cannot eliminate it. We also felt, but could never prove, that there were flaws in the standard System V implementation of the message facilities. Every so often we would see mysterious confusion in the ipcs reports that would only clear by rebooting. This occured on several different vendor's systems, so we suspected that there was some sort of subtle synchronization glitch in the kernel msgop support. Jay McCauley MIPS