Xref: utzoo comp.unix.wizards:20185 comp.windows.x:17091 comp.sys.hp:4003 Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!cernvax!chx400!forty2!mecazh!paul From: paul@mecazh.UUCP (Paul Breslaw) Newsgroups: comp.unix.wizards,comp.windows.x,comp.sys.hp Subject: Interrupted library calls Keywords: Xlib, library, signals, longjmp Message-ID: <373@node17.mecazh.UUCP> Date: 16 Jan 90 17:36:33 GMT Organization: Mecasoft SA, Zurich, Switzerland Lines: 82 This problem cropped up in the context of Xlib, but could equally apply to any Unix library. Hence the posting to more than one group. Our application (a CAM package on HP9000/3xx machines under HP-UX6.5 X11.R2) crashes sometimes when we handle a signal and return from the signal handler in a different context from the one in which the handler was entered. In other words we do a longjmp(3) from inside the handler. We found that this is an elegant way to design certain features into a program. [ Those of you who might want to argue this assertion read on. Those who are prepared to accept it can skip to the end of this []'ed bit. Our CAM package is a monolithic application running as a single process. Until Open Look or Motif is declared winner of the current X Look and Feel War, our application remains implemented using no tool kit, ie only pure Xlib calls. A user of our package can start a computation/display operation that might take a long time to complete. We wanted to allow him to hit a key to stop it, which would take him back to an earlier point in the dialogue. There are a large number of such long operations, so we needed a fairly general mechanism. We did not want to sprinkle calls to X arbitrarily in the code in the hope that they would provide a frequent enough poll. Neither did we want a signal handler to set a global flag and return normally, because that is simply the same polling problem in a different guise. You then have to sprinkle calls to check the global flag in the hope ... etc etc. So we had to have a signal handler to implement the required asynchronousness, and it had to exit abnormally to achieve its end. ] It is all the same, a pretty dangerous thing to do. This is especially so if the signal is allowed to interrupt any old bit of code that might be updating some data structure that is subsequently needed. And this, of course, is what happened when certain Xlib routines were interrupted. Now good old BSD and friends (like Ultrix and HP-UX) offer a number of means for dealing with the problem. 1. Interrupted system calls can be identified, and restarted when (if) the signal handler returns normally. 2. The application can be defensively programmed so that system calls which can be interrupted or partially completed are correctly handled. 3. Critical regions can be created with sigblock(2) and sigsetmask(2) providing DISABLE and ENABLE capabilities. Clearly 1 and 2 are fine for system calls, but useless for libraries. That leaves 3 - but whose responsibility is it to defend the data in the library - the implementor or the user? I suppose someone out there will cry `caveat emptor', but there are literally hundreds of X calls. How do I know which ones are critical and which ones not? If I bracket all the ones I use, I will end up with ugly code that runs slowly (remember it's two system calls per X call). Clearly this is a general problem, but I do not recall seeing anything about it on the net. Advice welcomed. Paul Breslaw. -- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Paul Breslaw, Mecasoft SA, | telephone : 41 1 362 2040 Guggachstrasse 10, CH-8057 Zurich, | e-mail : mcsun!chx400!mecazh!paul Switzerland. | paul@mecazh.UUCP