Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!njin!princeton!phoenix!bernsten From: bernsten@phoenix.Princeton.EDU (Dan Bernstein) Newsgroups: comp.unix.wizards Subject: Re: Making rm undoable Message-ID: <7360@phoenix.Princeton.EDU> Date: 24 Mar 89 10:00:33 GMT References: <6805@phoenix.Princeton.EDU> Reply-To: bernsten@phoenix.Princeton.EDU (Dan Bernstein) Distribution: usa Organization: Princeton U. Undergrad Math Majors, last time I checked Lines: 146 I did promise a few weeks ago to summarize this in a week... For those who missed the original posting, I was proposing that rather than somehow convert every rm into an mv (unlink() into rename()), you could prepare beforehand for losing files by making an extra link somewhere safe with ln (link()). (That's what I meant to say, anyway.) After all, link() followed by unlink() is almost the same as to rename(). I received several responses, each of which I summarize (in detail) below. Skip to GENERAL THOUGHTS at the end if you don't want to read eighty lines of discussion... Stephen C. North (hector!north, ulysses!north, hector.homer.nh.att.com) prefers the ``old alias "rm" trick'' for its simplicity. I'd say that there is less to think about for the user, but often-noted problems include: 1. If the file preservation is too invisible, you'll be too careless under a system or shell without it. 2. How do you make sure that all unlink()s use the alias? 3. How do you make sure that shell scripts that really should delete a file actually use the real rm? Nevertheless, the ``old alias "rm" trick'' does in practice prove quite useful. James R. Drinkwater (jd@csd4.milw.wisc.edu) sees the problem that every file in the file system would take up an extra inode. He said that when he wanted a file back, it was because of accidental deletion (e.g., rm a* without remembering attn.important) rather than later deciding he needed a file. He made a general proposal that the trash directory contain not only the deleted files but also soft links to their original positions; this is an excellent idea that applies to all trash directory methods. He also proposed that deleted files could be dumped to tape in the end rather than really erased; I think this would require superuser support and also that everybody use the same trash method---jd proposed a global trash directory. He pointed out that files should remain at least a fixed (though user-defined) time; I'd say that if the trash is emptied automatically, this had better be true, whil if the trash is emptied manually, it shouldn't. Christopher J. Calabrese (ulysses!cjc@research, cjc@ulysses.att.com) also pointed out that ``you really want to emptytrash only files over a certain age.'' He criticized my proposal, saying it would require too much overhead and too many ``huge and unnecessary directories'' to maintain, as well as time; this is basically correct. He brought up the problem of distinguishing between deleting when you want a copy preserved and deleting when you don't. He said that people most often delete files that they just created, and that for this reason changing rm's behavior is better than my proposal. Paul English (ileaf!io!speed!pme@eddie.mit.edu, pme@speed.io.uucp) also prefers the idea of changing rm's behavior. He proposed that rather than doing mv, safe rm should make a hard link to the file and then remove the original file. This is, like my method, more restricted than mv, which (on newer systems) can transfer files across filesystems; forcing a physical transfer of a potentially gigantic file is dubious, so I agree that an rm alias should understand the necessity of staying within a filesystem. Eli ? (echarne@orion.cf.uci.edu) mentioned that on his system, file names beginning with a comma are automatically removed after a few days, and that thus a safe way of removing files is to rename them to ,-files. I've observed this elsewhere (# files are also commonly removed); renaming files that way seems to me a very good solution. Barry Shein (bzs@xenna.encore.com) also observed that you usually delete what you're currently working on. He pointed out again the fundamental problem of convincing all programs to unlink() safely---except those shell scripts that should really erase the file (aargh)... He proposed that if UNIX supported real event signals (wake me up when a process does X, and pause that process in the meantime) one could easily trap all unlink()s, and noted that one can effectively do this by using NFS. He mentioned that some editors and other utilities unlink and then recreate the file, which deserves some discussion: The more common action (shell >, vi, most other programs) is to simply write over the file. This means that trapping unlink() won't stop most changes, and brings to light the fact that version numbering in UNIX is a very very tricky subject. What do you do if a process keeps a file open? Do you say the version number increases on each write() (very inefficient) or on each close()? How do you distinguish between files that should not be version numbered and files that should, and what about disk space? I am tempted to say that because of the unified UNIX philosophy for dealing with everything as just some type of file, version numbering is impossible---but I remember hearing someone mention it is possible, and if I do make my claim, Murphy will insure that I am publicly proven wrong. Carl Witty (cwitty@csli.stanford.edu) wondered what mvdir is (it's a general term covering whatever you have to do to move a directory---on BSD, mv can do mvdir, within filesystems...). He reminds us that ``the only cost for an extra hard link is the space in the directory file, which is certainly manageable.'' Of course, this is the opposite view to jd, who worries about all the extra inodes needed. I agree with cwitty; I've never seen more than half the inodes used, on any filesystem. Jerry Peek (jdpeek@rodan.acs.syr.edu) supports my idea and has been looking forward to this summary. Well, now you have it. Kevin Braunsdorf (ksb@j.cc.purdue.edu) said that at Purdue there are three entombing schemes, of which the best one, maintained by Matt Bradburn (mjb@staff.cc.purdue.edu), is a library redefining unlink(), link(), and rename() to safer versions. ``It works.'' Larry Wall (lwall@devvax.jpl.nasa.gov, lwall@jpl-devvax.jpl.nasa.gov) criticized my scheme since it doesn't work across filesystems, and thus doesn't work over his account. He would rather see a trashcan in each subdirectory; this is an interesting idea. GENERAL THOUGHTS The first person who reads this far wins a ... :-) If UNIX were the type of system where version numbering were possible (oops, I mean common, really I do) then the problem of file deletion would be trivial. But version numbering is not possible (oops, common) in UNIX. Changing the low-down behavior of at least unlink() and possibly link() and rename(), by (slow) NFS trickery or by a safe-rm library, would completely solve the problem of files being accidentally deleted. Perhaps the kernel should support this. However, this leaves the problem of files that you really want deleted, or the fact that this is not (yet?) the standard and thus programs will be written for the old standard, or shell scripts that only want a temporary file, or ... . So it's not a simple problem. As for the idea of a more long-term link() to make unlink() more safe, the responses have convinced me that without kernel support this is not an appropriate use of resources for all files. However, it would be useful as a ``preserve'' program that you explicitly invoke upon files that you do not want deleted at any cost. preserve would not stop any changes, and it would have to list all those programs that unlink() and recreate files as ``preserve will not work with these, sorry,'' but it would prevent accidental deletion of the named files. So you would just preserve your most important files, as a last resort. There could be advantages to writing preserve as simply a process that keeps the file open. This is a shorter-term solution, giving Murphy a great excuse to crash the machine; but it would not require an extra filesystem entry, and it would be trivial to include automatic warnings every so often if the file is accidentally removed. ``Mail from username... Subject: preserving "foo". To recover "foo", type "unrm foo"...'' Or I suppose the file could be re-instated in a trash directory by that process... ---Dan Bernstein, bernsten@phoenix.princeton.edu