Path: utzoo!mnetor!uunet!husc6!purdue!gatech!mcdchg!usenet From: lenb@houxs.UUCP Newsgroups: comp.unix Subject: Children's exit() status Message-ID: <4626@mcdchg.UUCP> Date: 22 Feb 88 17:13:02 GMT Sender: usenet@mcdchg.UUCP Lines: 65 Keywords: UNIX SVR3.1 fork() exec() exit() signal() Approved: usenet@mcdchg.UUCP Okay UNIX Sys V hackers, here's a question for you. In the following scenario, how should a parent process wait for it's children to complete: REQUIREMENT: I have a parent process who forks 30 identical children. The children conduct some measurements, and when done, each sends a single IPC message with results back to the parent and exits. The children are identical, so they should all have roughly equal life span, though that time may vary between 5 and 15 minutes. The parent needs to be woken when the first child exits -- a straight forward wait(). The parent must also know if any children complete in error. It is preferable that the parent check the children's exit status for any errors, since the system may indicate strange situations in the exit status, and the children are already designed to use exit(code). POSSIBLE SOLUTIONS: Here's what I've though of so far: There seem to be 2 types of solutions, either use wait() with or without SIGCLD, or use blocking message receives. I'd like to use wait(), because the children have a meaningful exit status. The question is, is it possible that my program be woken up only 20 times, for 30 children. Ie. could I miss child deaths because several occur "simultaneously". (simultaneously meaning while I'm awake checking one child's return code, another 2 children die -- the next wait() missing one or both of them.) If I *do* miss children deaths, then upon each wake up from wait, I could kill(pid, 0), each of the children to see if they're all dead. I wouldn't miss any deaths that way, but I'd still miss some exit codes. If I'm going to miss exit codes, I could use signal(SIGCLD, SIG_IGN) after the first child's death to wait() for the last child's death. Then I'd check to see if I have 30 messages waiting. There are warnings about using this signal in signal(2), so this is no good. Another possibility is to have the children send a software signal to the parent just before they die. I wouldn't miss any deaths, but this is no help with exit codes. Another solution is to use vanilla blocking message receives. I know how many children I have, and could expect that number of messages. I'd have to change the children to not send a message if they encountered a problem -- the message in effect acting as a "normal" return code. However, error codes from built in exit()s would be lost, unless redesigned to send the code in a message before exiting. I'd also lose any system information encodes in the exit code. Has anybody out there run in to this type of situation? Any facts, clues or pointers appreciated. If you reply, please cc: email since I don't often read news. Thanks. Len Brown 201-949-0092 { ihnp4 etc. }!houxs!lenb