Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!caen!sdd.hp.com!think.com!snorkelwacker.mit.edu!bloom-picayune.mit.edu!athena.mit.edu!jik From: jik@athena.mit.edu (Jonathan I. Kamens) Newsgroups: comp.unix.admin Subject: Re: Non Destructive Version of rm Message-ID: <1991May7.093346.16946@athena.mit.edu> Date: 7 May 91 09:33:46 GMT References: <11283@statware.UUCP> <1991May3.212619.21119@casbah.acns.nwu.edu> <1991May6.072447.21943@casbah.acns.nwu.edu> Sender: news@athena.mit.edu (News system) Distribution: na Organization: Massachusetts Institute of Technology Lines: 188 In article <1991May6.072447.21943@casbah.acns.nwu.edu>, navarra@casbah.acns.nwu.edu (John 'tms' Navarra) writes: |> The fact that among Athena's 'tenets' is that of similarity from |> workstation to workstation is both good and bad in my opinion. True, it |> is reasonable to expect that Unix will behave the same on similar workstations |> but one of the fundamental benifits of Unix is that the user gets to create |> his own environment. Our approach in no way prevents the user from creating his own environment. |> Thus, we can argue the advantages and disadvantages of |> using an undelete utililty but you seem to be of the opinion that non- |> standard changes are not beneficial No. What I am arguing is that users should have *access* to a similar environment on all workstations. They can do with that environment whatever the hell they want with it when they log in. They can use X, or not use X. They can use mwm, or twm, or uwm, or gwm, or whatever-the-hell-wm they want. they can use /bin/csh, or /bin/sh, or (more recently zsh), or a shell installed in a contributed software locker or in their home directory. They can configure their accounts as much as anyone at any Unix site, if not more. |> and I argue that most users don't use |> a large number of different workstations There are over 1000 workstations at Project Athena. Most users will log into a different workstation every time they log in. The biggest cluster has almost 100 workstations in it. Please remember that your environment is not everyone's environment. I am trying to explain why the design chosen by Project Athena was appropriate for Project Athena's environment; your solution may be appropriate for your environment (although I still believe that it does have problems). Furthermore, I still believe that Project Athena's approach is more generalized than yours, for the simple reason that our approach will work in your environment, but your approach will not work in our environment. |> and that we shouldn't reject a |> better method just because it isn't standard. The term "standard" has no meaning here, since we're talking about implementing something that doesn't come "standard" with Unix. |> I don't understand your setuid argument. All you do is have a directory |> called /var/preserve/navarra and have each persons directory unaccessible to |> others (or possibily have the sticky bit set on too) so that only a the owner |> of the file can undelete it. In order to be accessible from multiple workstations, the /var/preserve filesystem has to be a remote filesystem (e.g. NFS or AFS) mounted on each workstation. Mounting one filesystem, from one fileserver, on over 1000 workstations is not practical. Furthermore, it does not scale (e.g. what if there are 10000 workstations rather than 1000?), and another of Project Athena's main design goals was scalability. Finally, since all of the remote file access at Athena is authenticated using Kerberos (because both NFS and AFS are insecure when public workstations can be rebooted by users without something like Kerberos), all users would have to authenticate themselves to /var/preserve's fileserver in order to access it (to delete or undelete files). Storing authentication for every user currently logged in is quite difficult for one fileserver to deal with. We have over 10000 users at Project Athena. This means that either (a) there will have to be over 10000 subdirectories of /var/preserve, or (b) the directories will have to be created as they are needed, which means either a world-writeable /var/preserve or a setuid program that can create directories in a non-world-writeable directory. And setuid programs don't work with authenticated remote filesystems, which was my original point. Yes, many of these concerns are specific to Project Athena. But, as I said, what I'm trying to explain is not why all of the problems with your scheme I mentioned are problems everywhere (although some of them are), but rather why all of them are problems at Project Athena. |> Ahh, we can improve that. I can write a program called undelete that |> will look at the filename argument and by default undelete it to $HOME |> but can also include a second argument -- a directory -- to move the |> undeleted material. I am pretty sure I could (or some better programmer |> than I) could get it to move more than one file at a time or even be |> able to do something like: undelete *.c $HOME/src and move all files |> in /var/preserve/username with .c extensions to your src dir. |> And if you don't have an src dir -- it will make one for you. I'm sorry, but this does nothing to address my concerns. Leaving the files in the directory in which they were deleted preserves the state indicating where they were originally, so that they can be restored to exactly that location without the user having to specify it. Your way of accomplishing the same thing is a kludge at best and does *not* accomplish the same thing, but rather a crude imitation of it. |> As far as rm -r and undelete -r go, perhaps the best way to handle |> this is when the -r option is called, the whole dir in which you are |> removing files is just moved to /preserve. And then an undelete -r dir |> dir2 where dir2 is a destination dir, would restore all those files. What if I do "delete -r foo" and then realize that I want to restore the file "foo/bar/baz/frelt" without restoring anything else. My "delete" deletes a directory recursively by renaming the directory and all of its contents with ".#" prefixes, recursively. Undeleting a specific file several levels deep is therefore trivial, and my delete does it using only rename() calls, which are quite fast. Once again your system runs into the problem of /preserve being on a different filesystem (if it can't be, then you have restricted all of your files to reside on one filesystem), in which case copying directory structures is slow as hell and can be unreliable. Since my system does no inter-filesystem copying, it is fast (which was another requirement of the design -- delete cannot be significantly faster than /bin/rm). Let's see what your system has to do to undelete "foo/bar/baz/frelt". First, it has to create the undeleted directory "foo". It has to give it the same permissions as the deleted "foo", but it can't just rename() the "foo" in /preserve, since that might be across filesystems and since it doesn't want all of the *other* deleted files in /preserve/foo to show up undeleted. Then, it has to do the same thing with "foo/bar" and "foo/bar/baz". Then, it has to put "foo/bar/baz/frelt" back, copying it (slowly). It seems to me that your system can reap deleted files quickly, but can delete or undelete files rather slowly. My system reaps files slowly (using a nightly "find" that many Unix sites already run), but runs very quickly from the user's point of view. Tell me, whose time is more important at your site, the user's or the computer's (late at night)? |> Here are some more problems. Like rm, undelete would operate by looking |> thru /preserve. But if rm did not store files in that dir but instead stored |> them as .# in the current directory, then undelete would likewise have to |> start looking in the current dir and work its way thru the directory structure |> looking for .# files that matched a filename argument UNLESS you gave it |> a starting directory as an argument in which case it would start there. That |> seems like alot of hassle to me. Um, "undelete" takes exactly the same syntax as "delete". If you give it an absolute pathname, it looks in that pathname. If you don't, it looks relative to the current path. If it can't find a file in the current directory, then the file cannot be undeleted. This functionality is identical to the functionality of virtually every other Unix file utility. The system is not expected to be able to find a file in an entire filesystem, given just its name. The user is expected to know where the file is. That's how Unix works. Furthermore, the state is in the filesystem, so that if the user forgets where something is, he can use "find" or something to find it. It seems to me that Athena's design conforms more to the Unix paradigm than yours. |> You get a two day grace period -- then they are GONE! This is still faster |> than searchin thru the current directory (in many cases) looking for .# files |> to undelete. The speed of searching is negligible. The speed of copying the file, possibly very large, from another filesystem, is not. My program will *always* run in negligible speed, yours will not. |> SO now when I do an ls -las -- guess what! You are one of the few people who has ever told me that he regularly uses the "-a" flag to ls. Most people don't -- that's why ls doesn't display dotfiles by default. Renaming files with a ".#" prefix to indicate that they can be removed and to hide them is older than Athena's delete program; that's why many Unix sites already search for ".#" files. If you use "ls -a" so often that it is a problem for you, *and* if you delete so many files that you will often see deleted files when you do "ls -a", then don't do delete. You can't please all of the people all of the time. But I would venture to say that new users, inexperienced users, the users that "delete" is (for the most part) intended to protect, are not going to have your problems. |> make a shell variable RMPATH and you can set it to whatever PATH |> you want. The default will be /var/preserve but you can set it to $HOME/tmp |> or maybe perhaps it could work like the PS1 variable and have a $PWD |> options in which case it is set to your current directory. Then when you |> rm something or undelete something, the RMPATH will be checked. This solves pretty much none of the problems I mentioned, and introduces others. What if you delete something in one of your accounts that has a weird RMPATH, and then want to undelete it later and can't remember who you were logged in as when you deleted it? You've then got deleted files scattered all over your filespace, and in fact they can be in places totally unrelated to where they were originally. It makes much more sense to leave them where they were when they were deleted -- if you know what the file is about, you probably know in general where to look for it. -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710