Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!caen!sdd.hp.com!think.com!snorkelwacker.mit.edu!bloom-picayune.mit.edu!athena.mit.edu!jik
From: jik@athena.mit.edu (Jonathan I. Kamens)
Newsgroups: comp.unix.admin
Subject: Re: Non Destructive Version of rm
Message-ID: <1991May7.093346.16946@athena.mit.edu>
Date: 7 May 91 09:33:46 GMT
References: <11283@statware.UUCP> <1991May3.212619.21119@casbah.acns.nwu.edu> <JIK.91May6001507@pit-manager.mit.edu> <1991May6.072447.21943@casbah.acns.nwu.edu>
Sender: news@athena.mit.edu (News system)
Distribution: na
Organization: Massachusetts Institute of Technology
Lines: 188

In article <1991May6.072447.21943@casbah.acns.nwu.edu>, navarra@casbah.acns.nwu.edu (John 'tms' Navarra) writes:
|>      	The fact that among Athena's 'tenets' is that of similarity from
|>  workstation to workstation is both good and bad in my opinion. True, it
|>  is reasonable to expect that Unix will behave the same on similar workstations
|>  but one of the fundamental benifits of Unix is that the user gets to create
|>  his own environment.

  Our approach in no way prevents the user from creating his own environment.

|>  Thus, we can argue the advantages and disadvantages of
|>  using an undelete utililty but you seem to be of the opinion that non-
|>  standard changes are not beneficial

  No.  What I am arguing is that users should have *access* to a similar
environment on all workstations.  They can do with that environment whatever
the hell they want with it when they log in.  They can use X, or not use X. 
They can use mwm, or twm, or uwm, or gwm, or whatever-the-hell-wm they want. 
they can use /bin/csh, or /bin/sh, or (more recently zsh), or a shell
installed in a contributed software locker or in their home directory.  They
can configure their accounts as much as anyone at any Unix site, if not more.

|>  and I argue that most users don't use
|>  a large number of different workstations

  There are over 1000 workstations at Project Athena.  Most users will log
into a different workstation every time they log in.  The biggest cluster has
almost 100 workstations in it.

  Please remember that your environment is not everyone's environment.  I am
trying to explain why the design chosen by Project Athena was appropriate for
Project Athena's environment; your solution may be appropriate for your
environment (although I still believe that it does have problems). 
Furthermore, I still believe that Project Athena's approach is more
generalized than yours, for the simple reason that our approach will work in
your environment, but your approach will not work in our environment.

|>  and that we shouldn't reject a 
|>  better method just because it isn't standard.

  The term "standard" has no meaning here, since we're talking about
implementing something that doesn't come "standard" with Unix.

|> 	I don't understand your setuid argument. All you do is have a directory
|>  called /var/preserve/navarra and have each persons directory unaccessible to
|>  others (or possibily have the sticky bit set on too) so that only a the owner
|>  of the file can undelete it.

  In order to be accessible from multiple workstations, the /var/preserve
filesystem has to be a remote filesystem (e.g. NFS or AFS) mounted on each
workstation.

  Mounting one filesystem, from one fileserver, on over 1000 workstations is
not practical.  Furthermore, it does not scale (e.g. what if there are 10000
workstations rather than 1000?), and another of Project Athena's main design
goals was scalability.  Finally, since all of the remote file access at Athena
is authenticated using Kerberos (because both NFS and AFS are insecure when
public workstations can be rebooted by users without something like Kerberos),
all users would have to authenticate themselves to /var/preserve's fileserver
in order to access it (to delete or undelete files).  Storing authentication
for every user currently logged in is quite difficult for one fileserver to
deal with.

  We have over 10000 users at Project Athena.  This means that either (a)
there will have to be over 10000 subdirectories of /var/preserve, or (b) the
directories will have to be created as they are needed, which means either a
world-writeable /var/preserve or a setuid program that can create directories
in a non-world-writeable directory.  And setuid programs don't work with
authenticated remote filesystems, which was my original point.

  Yes, many of these concerns are specific to Project Athena.  But, as I said,
what I'm trying to explain is not why all of the problems with your scheme I
mentioned are problems everywhere (although some of them are), but rather why
all of them are problems at Project Athena.

|>     Ahh, we can improve that. I can write a program called undelete that
|>     will look at the filename argument and by default undelete it to $HOME
|>     but can also include a second argument -- a directory -- to move the
|>     undeleted material. I am pretty sure I could (or some better programmer
|>     than I) could get it to move more than one file at a time or even be
|>     able to do something like: undelete *.c $HOME/src and move all files
|>     in /var/preserve/username with .c extensions to your src dir.
|>     And if you don't have an src dir -- it will make one for you.

  I'm sorry, but this does nothing to address my concerns.  Leaving the files
in the directory in which they were deleted preserves the state indicating
where they were originally, so that they can be restored to exactly that
location without the user having to specify it.

  Your way of accomplishing the same thing is a kludge at best and does *not*
accomplish the same thing, but rather a crude imitation of it.

|> 	As far as rm -r and undelete -r go, perhaps the best way to handle
|>     this is when the -r option is called, the whole dir in which you are 
|>     removing files is just moved to /preserve. And then an undelete -r dir 
|>     dir2 where dir2 is a destination dir,  would restore all those files.

  What if I do "delete -r foo" and then realize that I want to restore the
file "foo/bar/baz/frelt" without restoring anything else.  My "delete" deletes
a directory recursively by renaming the directory and all of its contents with
".#" prefixes, recursively.  Undeleting a specific file several levels deep is
therefore trivial, and my delete does it using only rename() calls, which are
quite fast.

  Once again your system runs into the problem of /preserve being on a
different filesystem (if it can't be, then you have restricted all of your
files to reside on one filesystem), in which case copying directory structures
is slow as hell and can be unreliable.  Since my system does no
inter-filesystem copying, it is fast (which was another requirement of the
design -- delete cannot be significantly faster than /bin/rm).

  Let's see what your system has to do to undelete "foo/bar/baz/frelt". 
First, it has to create the undeleted directory "foo".  It has to give it the
same permissions as the deleted "foo", but it can't just rename() the "foo" in
/preserve, since that might be across filesystems and since it doesn't want
all of the *other* deleted files in /preserve/foo to show up undeleted.  Then,
it has to do the same thing with "foo/bar" and "foo/bar/baz".  Then, it has to
put "foo/bar/baz/frelt" back, copying it (slowly).

  It seems to me that your system can reap deleted files quickly, but can
delete or undelete files rather slowly.  My system reaps files slowly (using a
nightly "find" that many Unix sites already run), but runs very quickly from
the user's point of view.  Tell me, whose time is more important at your site,
the user's or the computer's (late at night)?

|>  	Here are some more problems. Like rm, undelete would operate by looking
|>  thru /preserve. But if rm did not store files in that dir but instead stored
|>  them as .# in the current directory, then undelete would likewise have to
|>  start looking in the current dir and work its way thru the directory structure
|>  looking for .# files that matched a filename argument UNLESS you gave it
|>  a starting directory as an argument in which case it would start there. That
|>  seems like alot of hassle to me.

  Um, "undelete" takes exactly the same syntax as "delete".  If you give it an
absolute pathname, it looks in that pathname.  If you don't, it looks relative
to the current path.  If it can't find a file in the current directory, then
the file cannot be undeleted.

  This functionality is identical to the functionality of virtually every
other Unix file utility.  The system is not expected to be able to find a file
in an entire filesystem, given just its name.  The user is expected to know
where the file is.  That's how Unix works.  Furthermore, the state is in the
filesystem, so that if the user forgets where something is, he can use "find"
or something to find it.  It seems to me that Athena's design conforms more to
the Unix paradigm than yours.

|>    You get a two day grace period -- then they are GONE! This is still faster
|>  than searchin thru the current directory (in many cases) looking for .# files
|>  to undelete. 

  The speed of searching is negligible.  The speed of copying the file,
possibly very large, from another filesystem, is not.  My program will
*always* run in negligible speed, yours will not.

|>  SO now when I do an ls -las -- guess what!

  You are one of the few people who has ever told me that he regularly uses
the "-a" flag to ls.  Most people don't -- that's why ls doesn't display
dotfiles by default.  Renaming files with a ".#" prefix to indicate that they
can be removed and to hide them is older than Athena's delete program; that's
why many Unix sites already search for ".#" files.

  If you use "ls -a" so often that it is a problem for you, *and* if you
delete so many files that you will often see deleted files when you do "ls
-a", then don't do delete.  You can't please all of the people all of the
time.  But I would venture to say that new users, inexperienced users, the
users that "delete" is (for the most part) intended to protect, are not going
to have your problems.

|>     make a shell variable RMPATH and you can set it to whatever PATH 
|>  you want. The default will be /var/preserve but you can set it to $HOME/tmp
|>  or maybe perhaps it could work like the PS1 variable and have a $PWD  
|>  options in which case it is set to your current directory. Then when you
|>  rm something or undelete something, the RMPATH will be checked.

  This solves pretty much none of the problems I mentioned, and introduces
others.  What if you delete something in one of your accounts that has a weird
RMPATH, and then want to undelete it later and can't remember who you were
logged in as when you deleted it?  You've then got deleted files scattered all
over your filespace, and in fact they can be in places totally unrelated to
where they were originally.  It makes much more sense to leave them where they
were when they were deleted -- if you know what the file is about, you
probably know in general where to look for it.

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710