Newsgroups: comp.unix.admin
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!think.com!snorkelwacker.mit.edu!bloom-picayune.mit.edu!athena.mit.edu!jik
From: jik@athena.mit.edu (Jonathan I. Kamens)
Subject: Re: Non Destructive Version of rm
In-Reply-To: navarra@casbah.acns.nwu.edu's message of 3 May 91 21:26:19 GMT
Message-ID: <JIK.91May6001507@pit-manager.mit.edu>
Sender: news@athena.mit.edu (News system)
Organization: Massachusetts Institute of Technology
References: <144@larry.UUCP> <11283@statware.UUCP>
	<1991May3.212619.21119@casbah.acns.nwu.edu>
Distribution: na
Date: Mon, 6 May 91 04:15:12 GMT
Lines: 98

  John Navarra suggests a non-destructive version of 'rm' that either
moves the deleted file into a directory such as
/var/preserve/username, which is periodically reaped by the system,
and from which the user can retrieve accidentally deleted files, or
uses a directory $HOME/tmp and does a similar thing.

  He points out two drawbacks with the approach of putting the deleted
file in the same directory as before it was deleted.  First of all,
this requires that the entire directory tree be searched in order to
reap deleted files, and this is slower than just having to search one
directory.  Second, the files show up when the "-a" or "A" flag to ls
is used to list the files in a directory.

  A design similar to his was considered when we set about designing
the non-destructive rm currently in use (as "delete") at Project
Athena and available in the comp.sources.misc archives.  There were
several reasons why we chose the approach of leaving files in the same
directory, rather than Navarra's approach.  They include:

1. In a distributed computing environment, it is not practical to
   assume that a world-writeable directory such as /var/preserve will
   exist on all workstations, and be accessible identically from all
   workstations (i.e. if I delete a file on one workstation, I must be
   able to undelete it on any other workstation; one of the tenet's of
   Project Athena's services is that, as much as possible, they must
   not differ when a user moves from one workstation to another).
   Furthermore, the "delete" program cannot run setuid in order to
   have access to the directory, both because setuid programs are a
   bad idea in general, and because setuid has problems in remote
   filesystem environments (such as Athena's).  Using $HOME/tmp
   alleviates this problem, but there are others....

2. (This is a big one.) We wanted to insure that the interface for
   delete would be as close as possible to that of rm, including
   recursive deletion and other stuff like that.  Furthermore, we
   wanted to insure that undelete's interface would be close to
   delete's and as functional.  If I do "delete -r" on a directory
   tree, then "undelete -r" on that same filename should restore it,
   as it was, in its original location.

   Navarra's scheme cannot do that -- his script stores no information
   about where files lived originally, so users must undelete files by
   hand.  If he were to attempt to modify it to store such
   information, he would have to either (a) copy entire directory
   trees to other locations in order to store their directory tree
   state, or (b) munge the filenames in the deleted file directory in
   order to indicate their original locationa, and search for
   appropriate patterns in filenames when undeleting, or (c) keep a
   record file in the deleted file directory of where all the files
   came from.

   Each of these approaches has problems.  (a) is slow, and can be
   unreliable.  (b) might break in the case of funny filenames that
   confuse the parser in undelete, and undelete is slow because it has
   to do pattern matching on every filename when doing recursive
   undeletes, rather than just opening and reading directories.  (c)
   introduces all kinds of locking problems -- what if two processes
   try to delete files at the same time.

3. If all of the deleted files are kept in one directory, the
   directory gets very large.  This makes searching it slower, and
   wastes space (since the directory will not shrink when the files
   are reaped from it or undeleted).

4. My home directory is mounted automatically under /mit/jik.  but
   someone else may choose to mount it on /mnt, or I may choose to do
   so.  The undeletion process must be independent of mount point, and
   therefore storing original paths of filenames when deleting them
   will fail if a different mount point is later used.  Using the
   filesystem hierarchy itself is the only way to insure mount-point
   independent operation of the system.

5. It is not expensive to scan the entire tree for deleted files to
   reap, since most systems already run such scans every night,
   looking for core files *~ files, etc.  In fact, many Unix systems
   come bundled with a crontab that searches for # and .# files every
   night by default.

6. If I delete a file in our source tree, why should the deleted
   version take up space in my home directory, rather than in the
   source tree?  Furthermore, if the source tree is on a different
   filesystem, the file can't simply be rename()d to put it into my
   deleted file directory, it has to be copied.  That's slow.  Again,
   using the filesystem hierarchy avoids these problems, since
   rename() within a directory always works (although I believe
   renaming a non-empty directory might fail on some systems, they
   deserve to have their vendors shot :-).

7. Similarly, if I delete a file in a project source tree that many
   people work on, then other people should be able to undelete the
   file if necessary.  If it's been put into my home directory, in a
   temporary location which presumably is not world-readable, they
   can't.  They probably don't even know who delete it.

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710