Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!wuarchive!udel!haven!adm!news From: mike@BRL.MIL ( Mike Muuss) Newsgroups: comp.unix.wizards Subject: Re: Checkpoint/Restart (was "no subject - file transmission") Message-ID: <24239@adm.BRL.MIL> Date: 21 Aug 90 04:25:26 GMT Sender: news@adm.BRL.MIL Lines: 42 >> And I remember people bragging about how cheap and small Unix >> processes were. How things have changed. UNIX processes still are pretty cheap, compared to more "traditional" operating systems (like OS/360). The real source of difficulty in checkpoint/restart comes from interfaces to "stateful" resources, like: *) Tape drives. Need to get the right reel back, in the right position. And hope that no other application or user has modified the tape in the interval between checkpoint and restart. *) Terminals. All the terminal modes should be saved and restored. What about other processes that might have come along in the meantime and started using the terminal, on restart? *) Network connections. The system can't keep the connection open while it's down. In general, it is not possible for the operating system to know how to restore the state of a network connection. Even saving the entire output stream and re-sending is not likely to have the right result. *) Temporary files. If the process depends on files in /tmp (which may or may not be open at the instant that the checkpoint is taken), and the system has a policy of clearing /tmp on reboot, then trouble will result. Therefore, I assert that it is the state of the I/O system, not the state of the UNIX processes, that is hard to checkpoint. Indeed, it is trivial to checkpoint file pointers, PID's, and other aspects of the *process* state. It isn't too hard to make sure that files have not changed between checkpoint and restart times. So, please don't bash the UNIX Process concept. Checkpoint/restart in any non-trivial I/O environment is *hard*. Cray Research has been rather successful in implementing checkpoint/ restart in their UNICOS version of UNIX. I believe that they have reported on this work, but offhand I don't have any references. Best, -Mike Muuss