Path: utzoo!attcan!uunet!timbuk!cs.umn.edu!uc!tut.cis.ohio-state.edu!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!sics.se!uplog.se!uplog.uppsala.telesoft.se!thomas From: thomas@uppsala.telesoft.se (Thomas Tornblom) Newsgroups: comp.unix.misc Subject: Process supervision in large SW systems. Message-ID: Date: 31 Oct 90 13:59:08 GMT Sender: thomas@uplog.se (Thomas Tornblom) Distribution: comp Organization: Telesoft Uppsala AB Lines: 27 What method do people use to control/create/kill/supervise different processes in large software systems? People are asking me how to implement a supervision system. It should be responsible for checking that processes are alive and are feeling well. It should also be able to in some intelligent way restart a process that has died or that is doing the wrong thing. Problem areas are interprocess communication, how to detect status changes (strict hierarchy of processes?, catching SIGCHILD?). People must have done this before in systems that requires high reliability. The system is going to used on fault tolerant hardware in the future so we need fault tolerant software. E-mail prefered. Thanks Thomas -- Real life: Thomas Tornblom Email: thomas@uppsala.telesoft.se Snail mail: Telesoft Uppsala AB Phone: +46 18 189406 Box 1218 Fax: +46 18 132039 S - 751 42 Uppsala, Sweden