Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!sri-spam!mordor!lll-lcc!ptsfa!amdahl!drivax!braun From: braun@drivax.UUCP Newsgroups: comp.unix.wizards Subject: make x + mv x z + rm z = crash Message-ID: <875@drivax.UUCP> Date: Tue, 3-Feb-87 20:12:07 EST Article-I.D.: drivax.875 Posted: Tue Feb 3 20:12:07 1987 Date-Received: Thu, 5-Feb-87 06:37:00 EST Reply-To: braun@drivax.UUCP (Karl T. Braun (kral)) Organization: Digital Research, Inc. Lines: 53 We managed to crash our Vax 11/780 this week. I was wondering if anybody could help me understand what went on. I have some strong suspicions, but if they are right, they point out some pretty weak places in Unix; the kind that my co-workers point at and say "See, Unix isn't a REAL operating system! Real Operating Systems wouldn't do that", then go back to the VMS machine, snickering. Anyway, the symptoms were as follows: A user called and said that some of his processes were hung, and although he had a prompt, he couldn't kill -9 any of them. Another user was having the same problem. It turns out that no one could kill any of these processes. But some processes could be killed. Only those that were in STAT 'D' (or 'DW') could NOT be killed. This makes sense to me, as I assume that a process has to come off of an event list before it can be killed. If this is assumption is correct, it seems like a weak point, but I can appreciate the difficulty of killing processes waiting on events. Anyway, the system finally ground to a halt, although not for some time (about 15 minutes after the first report came in). It turns out that no one had been writing to the file system that was being used by the processes in question. One of the processes was a compile in the assembly phase. This was not the native Unix compiler, but a cross compiler for another architecture. It is only slightly suspected as having been the culprit; only becuase it is a 3rd party product with a relatively short history. A stronger candidate was a combination of 'make' processes and .logout process which, by a strong co-incidence happened to be executing at the same time. The combination of processes produced the following tasks: 1: compile, creating file x.o 2: mv x.o /work/user/trashcan/x.o 3: rm /work/user/trashcan/x.o (1) was the result of 'make'ing x. (2) is the result of the users' "rm" command being aliased to "mv \!* ~/trashcan". (3) is the result of the users' .logout containing "/bin/rm /work/user/trashcan/*", and the user logging out while (1) and (2) were running. This makes me think that the file system either bogged down or got confused trying to chase it's own tail. And although I don't have a solution to the problem off hand, I think that this type of thing shouldn't bring a system to it's knees. Do you think I have made an adequate assesment of the problem? Do you agree or disagree with my opinion of it? Mail or Followup as appropriate. -- kral 408/647-6112 ...!{amdahl,ihnp4}!drivax!braun