Newsgroups: comp.unix.wizards Path: utzoo!sq!lee From: lee@sq.sq.com (Liam R. E. Quin) Subject: Re: ftw (was Re: Anything faster than stat(S)? ...) Message-ID: <1989Dec1.021044.3164@sq.sq.com> Reply-To: lee@sq.com (Liam R. E. Quin) Socks: black, warm & furry Organization: Unixsys (UK) Ltd References: <152@norsat.UUCP> <2586@unisoft.UUCP> <159@norsat.UUCP> <1989Nov21.070322.6352@dragos.uucp> <11676@smoke.BRL.MIL> <1989Nov23.195629.6577@eng.umd.edu> Date: Fri, 1 Dec 89 02:10:44 GMT djm@eng.umd.edu (David J. MacKenzie) writes: > gwyn@brl.arpa (Doug Gwyn) writes: >> ruiu@dragos.UUCP (dragos) writes: >>>Speaking of which, does anyone have any knowledge of the status of FTW ? >>>I've been tempted to try reverse engineering the routines from the Usenix >>>paper for my "quaint" SysV.2 system. The original message has vanished, but to the person who wanted something faster than readdir()/clsedir(), the vversions of ftw() I have seen do themselves use the ndir readdir() and closedir() stuff, so they are certainly no faster. On a reasonably recent System V system, ftw can be very fast. For example, on my 16MHz 386/ix machine at home I was able to do a find / -print > /dev/null in well under 20 seconds, with a second run producing no disk accesses at all, as everything was in the cache. I had over 250 MBytes' worth of data in over 50,000 files, so that is not too bad (the amount of data being less significant than the number of files, of course!). One thing to do is to have a directory daemon -- you give it a directory, and it returns all of the sub directory and file names marked as such. This isn't too hard with messages, for example, and has the advantage that while one process is processing (e.g. printing the file names), the other can be doing stat() on them. This might be part of the motivation for the find /dir -print | cpio -lots -of -options > /dev/ice paradigm -- I don't know. Some database systems (e.g. Oracle) have a read-ahead daemon that fetches the next block in (for example) a linked list. In many cases (not sure about Oracle here) all it needs to do is read it -- this puts it in the Unix buffer cache for a few seconds, long enough for the database client to use it without Unix having to re-read it from disk. The trouble with doing this for find(1)-like program is that it can be hard to tell how effective it is in "real-life" situations, but there are cases where it can be a real win. Finally, if you are really in need of speed, you could consider keeping a btree of filenames and paths. You only need to check that the directory has not altered to determine that it has no new, lost or renamed children, so you can simply keep a time-since-last-changed. Now you can do better than one stat per file, because you only have to check each file once when building the database and each directory (not file) again later. I don't know how to make find(1) or ftw(3) much faster than this, and this at at a considerable cost in complexity. Lee -- Liam R. Quin, Unixsys (UK) Ltd [note: not an employee of "sq" - a visitor!] lee@sq.com (Whilst visiting Canada from England, until Christmas) utai!anduk.uucp!lee (after Christmas) ...striving to promote the interproduction of epimorphistic conformability