Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!gem.mps.ohio-state.edu!apple!sun-barr!newstop!sun!amdahl!rtech!stevel From: stevel@rtech.rtech.com (Steve Langley) Newsgroups: comp.windows.x Subject: "fatal IO error 32" problems in Sun Server Keywords: Sun fatal IO error io.c FlushClient Message-ID: <3947@rtech.rtech.com> Date: 31 Oct 89 19:16:25 GMT Distribution: na Organization: Relational Technology Inc. Alameda, CA 94501 Lines: 62 We have a problem with the default Sun server I hope someone at MIT can comment on. From time to time some X applications we are building die with the following error message: XIO: fatal IO error 32 (Broken pipe) on X server "unix:0.0" after 8499 requests (8181 known processed) with 44 events remaining. The connection was probably broken by a server shutdown or KillClient. The only thing the failures have in common is they seem to happen when the event queue on the server side is filling up with unprocessed events. For example, after a button is pressed, a callback routine might go into a frenzy of activity by creating, destroying, moving, resizing, mapping, and unmapping widgets. This generates a large number of requests to the server and events for the client to read. But since the callback is doing this without ever calling XtMainLoop, the events just queue up until we return from the callback. So far no problem. But every once in a while the above error occurs. I was able to track it down to the server/os/4.2bsd/io.c routine. (We are running Sun OS4.0 on Sun 3/60's, X11R3 with (I think) fixes 1 through 8 installed.) As far as I can tell, Dispatch calls FlushAllOutput which calls FlushClient(client,oc, (char*)NULL,0); FlushClient calls the writev routine, and most of time everything works. But sometimes there is no I/O to be written, and so iovCnt==0. The writev routine doesn't like this, and fails with an error (EINVAL) because of invalid arguments. Because errno is not equal to either EWOULDBLOCK or EBADF FlushClient assumes the write has failed and the client has died, leading it to close the connection. If 'notdef' had been defined in io.c you would see the message: Closing connection xx because write failed Ultimately this results in the client seeing the 'fatal IO error' above. Now, is this a bug or a feature? Is there a known problem like this in the Sun server and I just haven't picked up the fix? Or am I inadvertently doing something in my application that makes this happen? It is a Bad Thing to generate a lot of events, and if so how many are a "lot"? I put a line of code in FlushClient which just returns without doing any I/O is iovCnt == 0; this seems to cure the problem. (I added an ErrorF message to tell me when this happens; every now and then the message appears and the client keeps on running, apparently with no problems.) So, any comments? If the answer is "fixed in R4" that's okay (since I seem to have a workaround) but I'd appreciate some more information on what's happening here. +--------------------------------------------------------------------------+ | Steve Langley | Phone: (415)748-3658 | | Relational Technology, Inc. | Internet: stevel@ws58s.rtech.com | | P.O. Box 4008 | | | 1080 Marina Village Parkway | | | Alameda, California 94501 | | +--------------------------------------------------------------------------+