Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!ucsd!hub.ucsb.edu!spectrum.CMC.COM!lars
From: lars@spectrum.CMC.COM (Lars Poulsen)
Newsgroups: comp.unix.wizards
Subject: Re:  Checkpoint/Restart
Message-ID: <1990Aug22.191258.19072@spectrum.CMC.COM>
Date: 22 Aug 90 19:12:58 GMT
References: <24239@adm.BRL.MIL>
Organization: Rockwell CMC
Lines: 87

In article <24239@adm.BRL.MIL> mike@BRL.MIL ( Mike Muuss) writes:
> Checkpoint/restart in any non-trivial I/O environment is *hard*.

True, indeed. It is fairly instructive to look at how (other?)
commercial operating systems have dealt with this issue. As could be
expected, there is a wide variety of checkpoint/restart implementations.

The earliest checkpoint/restart implementations in the days of
single-user machines were just memory dumps, with tape drive
repositioning and a way to notify the application that it had been
restarted. IBM70{4,9,40,90,94} type stuff. CDC3600 SCOPE.

When direct-access storage came along, it was originally small, and used
for temporary files; so it was copied to the checkpoint tape. The
checkpoint system that I know best - UNIVAC 1100 EXEC-8 - is of this
type. A checkpoint file is usually a tape file, containing a memory
image, all spool files (input and output) and all temporary files. File
pointers are not an issue, since all permanent disk files are direct
access files (the read/write calls have a file position in them) so
"file pointers" live in user space.

Even so, the checkpoints were complex enough that my installation (an
academic computing center) disabled the checkpoint facility since
ill-structured checkpoint restarts often crashed the system. (How about
restarting from a checkpoint taken on a different system - or before
last week's sysgen).

Interestingly enough, EXEC-8 retrograded in later releases to provide a
lesser checkpoint (memory image only) known as a "partial checkpoint" as
a cheaper and safer alternative.

> ...  The real source of difficulty in checkpoint/restart comes from
>interfaces to "stateful" resources, like:

Yes, there is a TON of state information to be preserved. For all but
trivial tasks, this involves many megabytes of file space.
>
>*)  Tape drives.  Need to get the right reel back, in the right position.

Easy, compared to the other stuff.

>*)  Terminals.  All the terminal modes should be saved and restored.
>What about other processes that might have come along in the meantime
>and started using the terminal, on restart?

Indeed, the semantics of shared terminal devices are a great source of
implementation problems. This a probably a mis-feature.

>*)  Network connections.  The system can't keep the connection ... 

Agreed. Other than the controlling terminal, network connections should
be banned. And the controlling terminal should be a disconnectable
virtual terminal like VMS' VTAxxx: device.

>*)  Temporary files.  If the process depends on files in /tmp ...

The biggest problem here, is that UNIX does not know the concept of
temporary files. A _real_ temporary file is what you have after
	fd = creat("/tmp/xxxx" ...
	unlink("/tmp/xxxx");
But unix would have no way of restoring such a beast, I think.

>Therefore, I assert that it is the state of the I/O system, not the state
>of the UNIX processes, that is hard to checkpoint.  Indeed, it is trivial
>to checkpoint file pointers, PID's, and other aspects of the *process*
>state.  It isn't too hard to make sure that files have not changed
>between checkpoint and restart times.

But in many cases you DO want to change the file. Sometimes the failure
you are recovering from was caused by bad data in a permanent file. You
want to be able to fix the bad record and then restart from the last
checkpoint before that record was seen.

The biggest can of worms has not even been touched upon here: What about
the state of a large DBMS that the checkpointed process may be
accessing. Do you want to restore it to the state when the checkpoint
was taken, thus backing out all updates since the large job failed ?
When the job failed, were all transactions performed by the job backed
out ? If so, the before-and-after-looks need to be part of the
checkpoint so they can be re-installed. What if those records have been
updated since the checkpoint ?

The biggest jobs, which need checkpoints the most, provide the biggest
cans of worms.
-- 
/ Lars Poulsen, SMTS Software Engineer
  CMC Rockwell  lars@CMC.COM