Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!mailrus!uflorida!gatech!emory!arnold From: arnold@mathcs.emory.edu (Arnold D. Robbins {EUCC}) Newsgroups: comp.unix.questions Subject: Re: Computational complexity of rm & ls Message-ID: <3804@emory.mathcs.emory.edu> Date: 10 Mar 89 18:17:10 GMT References: <9000012@m.cs.uiuc.edu> Reply-To: arnold@emory.UUCP (Arnold D. Robbins {EUCC}) Organization: Emory University Lines: 36 In article <9000012@m.cs.uiuc.edu> wsmith@m.cs.uiuc.edu writes: >The 4.3 Unix >appeared to be quadratic in system time for this operation, while the 4.2 Unix >was cubic or worse in the amount of system time required! > >Surely rm can do better than cubic, can't it? "rm *" seems to be a common >enough idiom that rm could be optimized to handle that case better. I would >like rm to be linear in system time. Is that possible? This is not really an 'rm' issue, it's a file system issue. The shell (pick a shell, any shell, sh, csh, ksh) sorts the expansion of *; the order the filenames are in the directory is irrelevant to rm. Rm then proceeds to unlink each file in turn, probably randomly deleting entries from the middle of the directory. 4.3 made a number of changes to directory handling over 4.2, one of which was to perform compaction of the blocks in a directory when possible, as entries are deleted. I'll bet that if you were to rm `ls -f` and rm `ls -f | tac` you'd see very interesting behavior (i.e. remove files starting with the first and last files in the directory, respectively), since each unlink has to search the directory from scratch to find the entry. (Actually, on 4.3 the last one would be pathologically bad, while the first would be really quick, due to the directory caching in namei.) I guess the point here is to know where to point the finger; since it was all system time, it's not something that can really be fixed in rm. Your 4.2 vendor should upgrade their filesystem code to 4.3. -- Unix is a Registered | Arnold Robbins -- Emory University Computing Center Bell of AT&T Trademark | DOMAIN: arnold@unix.cc.emory.edu Laboratories. | UUCP: gatech!emory!arnold PHONE: +1 404 727-7636 -- Donn Seeley | BITNET: arnold@emoryu1 FAX: +1 404 727-2599