Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!emory!stiatl!pda From: pda@stiatl.UUCP (Paul D. Anderson) Newsgroups: comp.unix.questions Subject: SUMMARY: which is more efficient... Message-ID: <9199@stiatl.UUCP> Date: 1 Mar 90 20:22:34 GMT Reply-To: pda@stiatl.UUCP (Paul D. Anderson) Organization: Sales Technologies Inc., "The Prototype IS the Product..." Lines: 671 Here is the original question I posed: |Which is more efficient (ie: cheaper and faster) in a program? | - to chdir("/path") and then unlink("file") for each file in the ... | - to specify the entire path in each call to unlink()? | |I know of namei() in the kernal, it has been suggested that doing |the chdir() first will mean namei() will not have to be invoked as |often. But does not unlink() have to chase back up the directory |structures (re-construct the directory tree) in order to ascertain |if the correct permissions exist to allow removal of the file? Therefore, |would not calling unlink() with the full pathname be more efficient, |since all information is provided to unlink() about the path? The responses were thorough and unique, and some even included test source. Make sure you see the timing test results near the end of this summary. My thanks to everyone who sent me the information! What follows are the [edited] summaries... -- From emory!ulysses.att.com!ekrell Tue Feb 20 22:15:24 1990 Eduardo Krell, AT&T Bell Laboratories, Murray Hill, NJ It's cheaper to chdir to that directory first and then unlink the files using a relative pathname. How much cheaper will depend on the version of Unix. It makes a bigger difference in old versions of System V. BSD uses a namei cache and that diminishes the benefits of doing the chdir(), but it will still run faster. unlink() doesn't need to check the permissions of the directory hierarchy up to the root; it only cares about the permissions on the directories you specified in the argument. If you call unlink() with just a simple relative file name like "foo.c", it will check the permissions of the current working directory, for which there's already an inode pointer in the user structure. -- From gatech!ames!pacbell!ctnews!unix386!dick Wed Feb 21 03:00:30 1990 Dick Hacking Why not put all the files in their own filesystem and then do a mkfs when you are finished with them.... that is *MUCH* faster than trying to unlink anything. [novell solution!] -- From emory!cs.UMD.EDU!chris Wed Feb 21 04:40:22 1990 Chris chdir() is more efficient. No, unlink need chase nothing: permission to unlink files is based on the permissions of the directory in which the files reside. If that directory can be written (by user/group/other, whichever matches first), files in that directory may be unlinked. -- From gatech!mailrus!uunet!prcrs!wrwalke Wed Feb 21 05:00:17 1990 bill. on all systems i work on bsd4.2, 4.2, and ultrix, hp-ux, "file" and "/full/path/to/file" are both accessed as inode #'s. the unlink merely drops the link count to that inode. if it has no external links, the inode is free'd up for later use. the chdir() call merely causes another inode lookup and current process info change. in other words, the chdir() would be a waste of time. [i presume he means it would be a waste of time if I only deleted one file in a directory before moving to the next.] another note, be very careful in unlinking dirs, you are freeing an inode of a file that still must remain to physically access the other files under it. fsck will fix it, but it can be messy on the screen and cause a shortage of inodes (in sysV at least), as the numbers are not free'd. -- From stiatl.uucp!bgi Tue Feb 20 16:32:25 1990 Brad Well... according to _The Design of the UNIX Operating System_ by Maurice J. Bach, the most efficient is to chdir first. Reason being that unlink doesn't have to reconstruct the path back to root checking permissions. I was suspicious of this until I tested it myself. [Brad performed the same test as Jon Kamens, see further down.] -- From emory!cbnewsi!npl Wed Feb 21 15:30:09 1990 Nick Landsberg, AT&T Bell Laboratories Just based on intuition and some knowledge of the internals, I would venture that the "chdir" approach is MUCH faster, especially if "path" is long. If "path" is long then every element in the path must be parsed by the kernel for every unlink call, the appropriate i-node looked up, the next element found, etc. With the chdir() approach, the path lookup is only done once and then only the current directory searched for every file to be unlinked. You've stumbled into a "security hole" in the system. The only check done about unlinking a file (by the kernel) is if you have write permission in the current directory, since you are changing the contents of the current directory. (This may not be true of all flavors of Unix, your mileage may vary.) Thus, "unlink()" does not traverse the directory backwards to check all directories for write permission, you aren't changing any of them. Whether this is "correct" behavior is questionable, but the other way is inefficient. -- From gatech!uflorida!ki4pv!cdis-1!tanner Thu Feb 22 11:31:11 1990 If you unlink using the entire path each time, then you will have to run through the entire set of directories (in a kernel routine most likely called "namei") to find the file, do permission checks, & check them each time. If, on the other hand, you chdir() to the offending directory, you can unlink the files by name without having to scan the path and read each directory on the way. However, you have to remember where you were before (and change back), or relative pathnames will stop working. -- From gatech!mailrus!uunet!attcan!telly!eci386!jmm Thu Feb 22 11:31:14 1990 John Macdonald Why not use: system( "rm -rf /path/*" ) or system( "rm /path/*" )? (The first for recursively removing sub-directories as well.) Unless there was major foolishness on the part of the implementors of your Unix system, this should be very close to optimal for large numbers of files (as long as the number of files doesn't overflow the command line: in such a case you could use: system( "find /path -type f -print | xargs rm; rm -rf /path" ) or system( "find /path -type f -print | sed -e '/^\/path\/.*\//d' | xargs rm" ) although the second does have to traverse the entire tree even though it is only removing the top level files. -- From gatech!mit-eddie!pit-manager.MIT.EDU!jik Thu Feb 22 11:31:23 1990 jik@Athena.MIT.EDU, Office: 617-253-8495 Jonathan I. Kamens, MIT Project Athena [doesn't unlink() have to chase back up the directory tree?] No, unlink() doesn't have to do anything of the sort. Directory permissions in Unix are not based on permissions of higher-up directories in any direct manner. In other words, once you've managed to get into a directory, you can perform operations in that directory paying attention only to the permissions in that particular directory. Example: pit-manager% ls -ldg foo foo/bar drwxrwxr-x 3 jik wheel 512 Feb 21 11:59 foo/ drwxrwxr-x 2 jik wheel 512 Feb 21 11:59 foo/bar/ pit-manager% cd foo/bar pit-manager% chmod 000 ../../foo pit-manager% touch frep pit-manager% ls -ldg . drwxrwxr-x 2 jik wheel 512 Feb 21 12:00 ./ pit-manager% ls -ldg .. d--------- 3 jik wheel 512 Feb 21 11:59 ../ So, you see, I was able to perform operations in "bar" even after I could no longer do so in "foo". One of the advantages of this is that if you make a directory executable but not readable, and then put readable+executable directories underneath it, then people can cd into those directories and read from them if they know the names, but not if they don't (i.e. they can't do an "ls" on the parent directory, but they can use "cd parent/child" if they know the name of the child. -- From gatech!BBN.COM!pineapple.bbn.com!bbn.com!rsalz Fri Feb 23 04:15:20 1990 Rich $alz Once you chdir to the directory, to remove a file all you need is write permission in the dir. The whole path all the way down only counts in the chdir. Now, it is possible that cd /foo/bar/baz ; rm zap will fail (no "r" perm in baz, I guess) while rm /foo/bar/baz/zap would succeed. Doesn't happen very often. -- From emory!uunet.UU.NET!prcrs!wrwalke Fri Feb 23 19:40:04 1990 William Walker boy, you did it now, we have a battle going on now as to whether the chdir() as a system call eats more than tracing absolute pathnames. apparently if the dirs are rather empty, less than say 5 or six files each, the consensus is that the syscall is not efficient. if the dir is very full, say maybe in the 20-200 file range, the overhead would make up for the second syscall. everyone and his brother will now be writing test code to prove a point. on most systems only root can remove a non- empty dir, on some systems, even remove a busy executable. if you are running as root, make damn sure you don't blow a non-empty dir. fsck will make you wish you were sitting by the pool sipping marguaritas. (unless you run it in "SHUT-UP" mode, -p ??). -- From emory!virtech!cpcahil Fri Feb 23 20:15:08 1990 Conor P. Cahill Technically, the first method [chdir()/unlink()] will be slightly more efficient, since namei() (or lookupname()) will only have to parse the "file" portion at unlink time. However, you will probably not be able to demonstrate a difference in executable time between the two because of namei() caching and/or buffer caching. The real difference would come if the path to file was very long, your system was very heavily loaded, and the directories along the path were very large (had lots of entries). So, for most cases, you won't be able to tell the difference. -- From emory!uunet.UU.NET!auspex!auspex.com!guy Tue Feb 27 18:10:14 1990 Guy Harris > Let me clarify my question from the previous posting: > I know of namei() in the kernal, it has been suggested that doing > the chdir() first will mean namei() will not have to be invoked as > often. No. It will just mean that "namei()", in those systems that have it, will do less work looking up the file. In those systems that don't have it, e.g. any system with SunOS-style vnodes - including S5R4 - the equivalent routine will also do less work looking up the file. In both cases, the reason is that it takes more work to look up foo/bar/bletch/mumble.c than it does to look up mumble.c because in the first case you have to search "foo" for "bar", and then "bar" for "bletch", and then "bletch" for "mumble.c", while in the second case you only have to search the current directory for "mumble.c". Directory name caches help here, but they don't completely eliminate the slowdown. In addition, it takes more CPU time to 1) construct the longer path in your program and 2) copy it into the kernel. [But does not unlink() have to chase back up the dir tree?...] No. The only permission "unlink()" requires to allow removal of the file - or, more correctly, removal of a directory entry referring to a file - is write permission on the directory containing that entry. "rm", if invoked without the "-f" flag, also checks whether the file is writable, but that doesn't require any directory searching either. In other words, unlink("foo/bar/bletch/mumble.c") is less efficient than unlink("mumble.c") >And does the recommended method used vary based on Sys V or BSD 4.2/3 ?? I don't expect so. -- From emory!lzga!bogatko Tue Feb 20 22:15:21 1990 GB My vote is a chdir, followed by unlinks. My authority is BACH - "Design of the Unix Op. Sys", section 4.4 pages 74 and 75. Short summary -- each path resolution requires a namei resolution from the first path component, if there are three components to the path, then three namei resolutions are required. If there is only one component, then there is only one resolution. As far as I could see from the 5.2 source, unlink calls iget on the full path if it has a full path, and avoids iget if it has ".". Admittedly I didn't spend a lot of time looking at the stuff (it produces headaches) but it seems to infer that chdir is the better way to go. Incidently, Here's how I did it once. The example suffers from doing unlinks on complicated paths, but putting in a 'chdir' shouldn't be too difficult. This is System 5.3 specific. /* * * cleanfiles.c * */ /* INITIAL CODING DATE Sun Sep 25 15:45:45 EDT 1988 by George M. Bogatko HEADER FILES */ #include #include #include #include #include /*---------------- IMPORTED GLOBAL VARIABLE/FUNCTION DEF'S ----------------*/ extern char *regcmp(); extern char *regex(); /*---------------- INTERNAL GLOBAL VARIABLE/FUNCTION DEF'S ----------------*/ static char sccs_id[] = "cleanfiles.c 1 1/16/89 15:15:16"; /*---- SYNOPSIS: clean_files(directory, pattern, age) char *directory; char *pattern; long age; DESCRIPTION: Clean_files removes all the files in 'directory' whose names match the regular expression 'pattern' and whose modification times exceed 'age' seconds. RETURN: The function returns the number of files successfully removed. EXAMPLE: to remove temporary files in '/usr/tmp', whose names begin with 'ConV', and are more than 10 minutes old, call: num_removed = clean_files("/usr/tmp", "^ConV", 600); | notice the caret CAVEATS: Accidently removed files cannot be recovered. Use this carefully. This function uses 'regcmp' and 'regex', thus the pattern used in the match uses the syntax of 'ed', 'ex', and 'vi' NOT 'sh'. 'regcmp' and 'regex' are located in libPW.a (cc ... -lPW). ------- */ clean_files(directory, pattern, age) char *directory; char *pattern; long age; { DIR *dir; struct dirent *dir_ent; struct stat buf; char *cmp_pat; time_t curtime; int num = 0; char tempbuf[100]; if( (dir = opendir(directory)) == (DIR *)NULL ) return -1; curtime = time( (time_t *)NULL ); cmp_pat = regcmp(pattern, (char *)NULL); /* * * * * YOU COULD PUT A CHDIR HERE, but be careful if you want to * go back to where you were! * * * */ while( (dir_ent = readdir(dir)) != NULL ) { if( regex(cmp_pat, dir_ent->d_name) != (char *)NULL ) { sprintf(tempbuf, "%s/%s", directory, dir_ent->d_name); stat(tempbuf, &buf); if( (curtime - buf.st_mtime) > age ) { if(unlink(tempbuf) != -1) num++; } } } closedir(dir); free(cmp_pat); return num; } -- From emory!lzga!bogatko Wed Feb 21 10:40:13 1990 GB Always one to never believe straight theory, I've done a test. This program creates 500 dummy files in "/usr/tmp/1/2/3/4/5/6/7/8/9" and then cleans them out, all from a C program. I picked such a long path name under the assumption that the namei() resolution would slow down the unlink process, and wanted to give namei() a lulu of a path to chew on. The results are in 'edres', As you can see, CHDIR is the better way to go. This was written and tested on a 3B600, under System 5.3.1. If you're not on such a box, that may explain why you don't have 'opendir', 'readdir', or 'closedir'. Hope this helps. ***** CUT HERE ***** #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # makefile # cleanmain.c # edres # testrm # This archive created: Wed Feb 21 09:36:37 1990 export PATH; PATH=/bin:/usr/bin:$PATH echo shar: "extracting 'makefile'" '(309 characters)' if test -f 'makefile' then echo shar: "will not over-write existing file 'makefile'" else sed 's/^ X//' << \SHAR_EOF > 'makefile' XD= X XCFLAGS=-I. -O -Ksd $D XLIBS=-lPW X Xall: X make cleanmainC D=-DCHDIR X rm cleanmain.o X make cleanmainNC X make clean X XcleanmainC: cleanmain.o X cc -o cleanmainC cleanmain.o $(LIBS) X XcleanmainNC: cleanmain.o X cc -o cleanmainNC cleanmain.o $(LIBS) X Xclean: X rm cleanmain.o X Xclobber: X rm cleanmainC X rm cleanmainNC SHAR_EOF if test 309 -ne "`wc -c < 'makefile'`" then echo shar: "error transmitting 'makefile'" '(should have been 309 characters)' fi fi echo shar: "extracting 'cleanmain.c'" '(2114 characters)' if test -f 'cleanmain.c' then echo shar: "will not over-write existing file 'cleanmain.c'" else sed 's/^ X//' << \SHAR_EOF > 'cleanmain.c' X#include X#include X#include X#include X#include X#include X X#define DIRECTORY "/usr/tmp/1/2/3/4/5/6/7/8/9" X#define TMP "_RmT_" X#define PAT "^_RmT_" X Xextern char *regcmp(); Xextern char *regex(); X Xmain(argc, argv) Xint argc; Xchar *argv[]; X{ Xstruct tms tbuf; Xlong btime, etime; Xchar *tnam; Xint i; Xint fd; Xint num; Xchar *dir_tory; X X#ifdef CHDIR X fprintf(stderr, "CHDIR METHOD\n"); X chdir(DIRECTORY); X dir_tory = "."; X#else X fprintf(stderr, "NON-CHDIR METHOD\n"); X dir_tory = DIRECTORY; X#endif X X fprintf(stderr, "creating 500 files\n"); X X btime = times(&tbuf); X for(i=0; i< 500; i++) X { X tnam = tempnam(dir_tory, TMP); X if( (fd = creat( tnam, 0666 )) == -1 ) X { X perror("creat failure"); X exit(-1); X } X close(fd); X } X etime = times(&tbuf); X X fprintf(stderr, "creating took %ld clicks\n", etime-btime); X fprintf(stderr, "usrtime - %ld\n", tbuf.tms_utime); X fprintf(stderr, "systime - %ld\n", tbuf.tms_stime); X fprintf(stderr, "\n\n"); X X fprintf(stderr, "calling clean_files\n"); X X btime = times(&tbuf); X num = clean_files(DIRECTORY, PAT); X etime = times(&tbuf); X X fprintf(stderr, "cleaned %d files\n\n",num); X fprintf(stderr, "cleaning took %ld clicks\n", etime-btime); X fprintf(stderr, "usrtime - %ld\n", tbuf.tms_utime); X fprintf(stderr, "systime - %ld\n", tbuf.tms_stime); X fprintf(stderr, "\n\n"); X X return 0; X} X X Xclean_files(directory, pattern) Xchar *directory; Xchar *pattern; X{ XDIR *dir; Xstruct dirent *dir_ent; Xchar *cmp_pat; Xint num = 0; Xchar tempbuf[100]; Xchar *dir_tory; X X#ifdef CHDIR X chdir(directory); X dir_tory = "."; X#else X dir_tory = directory; X#endif X X if( (dir = opendir(dir_tory)) == (DIR *)NULL ) X return -1; X cmp_pat = regcmp(pattern, (char *)NULL); X X while( (dir_ent = readdir(dir)) != NULL ) X { X if( regex(cmp_pat, dir_ent->d_name) != (char *)NULL ) X { X/* X * we'll have to use this sprintf in the non-chdir method, so I'll keep X * the overhead in the chdir method. X */ X sprintf(tempbuf, "%s/%s", dir_tory, dir_ent->d_name); X if(unlink(tempbuf) != -1) X num++; X } X } X closedir(dir); X free(cmp_pat); X return num; X} SHAR_EOF if test 2114 -ne "`wc -c < 'cleanmain.c'`" then echo shar: "error transmitting 'cleanmain.c'" '(should have been 2114 characters)' fi fi echo shar: "extracting 'edres'" '(637 characters)' if test -f 'edres' then echo shar: "will not over-write existing file 'edres'" else sed 's/^ X//' << \SHAR_EOF > 'edres' XCHDIR METHOD X XCREAT USR SYS CLEAN USR SYS X X1510 63 493 3375 88 807 X1949 49 665 3064 68 952 X1717 55 638 3069 82 978 X1813 51 653 3054 73 938 X1683 62 633 3074 71 896 X1676 61 634 3057 85 945 X1681 43 648 3083 59 958 X1736 54 643 3061 69 936 X1687 65 626 3090 80 870 X1682 58 640 3143 66 893 X X X****************** X X XNON-CHDIR METHOD X XCREAT USR SYS CLEAN USR SYS X X1774 61 1170 3591 69 1638 X1764 72 1179 3278 89 1601 X1742 77 1162 3278 103 1616 X1738 65 1181 3253 80 1609 X1785 66 1193 3283 77 1651 X1730 75 1175 3320 90 1628 X1722 81 1177 3289 104 1633 X1756 75 1173 3256 96 1606 X1750 61 1177 3391 82 1594 X1833 66 1183 3258 90 1619 SHAR_EOF if test 637 -ne "`wc -c < 'edres'`" then echo shar: "error transmitting 'edres'" '(should have been 637 characters)' fi fi echo shar: "extracting 'testrm'" '(182 characters)' if test -f 'testrm' then echo shar: "will not over-write existing file 'testrm'" else sed 's/^ X//' << \SHAR_EOF > 'testrm' X> results Xfor i in 1 2 3 4 5 6 7 8 9 10 Xdo X echo CHDIR round $i X cleanmainC 2>>results Xdone X Xfor i in 1 2 3 4 5 6 7 8 9 10 Xdo X echo NON-CHDIR round $i X cleanmainNC 2>>results Xdone SHAR_EOF if test 182 -ne "`wc -c < 'testrm'`" then echo shar: "error transmitting 'testrm'" '(should have been 182 characters)' fi fi exit 0 # End of shell archive -- -- Paul Anderson * h:404-565-0761 w:404-841-4000 {mathcs.emory,gatech}.edu!stiatl!pda || pda@SalesTech.Com