Newsgroups: comp.unix.wizards
Path: utzoo!sq!lee
From: lee@sq.sq.com (Liam R. E. Quin)
Subject: Re: ftw (was Re: Anything faster than stat(S)? ...)
Message-ID: <1989Dec1.021044.3164@sq.sq.com>
Reply-To: lee@sq.com (Liam R. E. Quin)
Socks: black, warm & furry
Organization: Unixsys (UK) Ltd
References: <152@norsat.UUCP> <2586@unisoft.UUCP> <159@norsat.UUCP> <1989Nov21.070322.6352@dragos.uucp> <11676@smoke.BRL.MIL> <1989Nov23.195629.6577@eng.umd.edu>
Date: Fri, 1 Dec 89 02:10:44 GMT

djm@eng.umd.edu (David J. MacKenzie) writes:
> gwyn@brl.arpa (Doug Gwyn) writes:
>> ruiu@dragos.UUCP (dragos) writes:
>>>Speaking of which, does anyone have any knowledge of the status of FTW ?
>>>I've been tempted to try reverse engineering the routines from the Usenix
>>>paper for my "quaint" SysV.2 system.

The original message has vanished, but to the person who wanted something
faster than readdir()/clsedir(), the vversions of ftw() I have seen do
themselves use the ndir readdir() and closedir() stuff, so they are
certainly no faster.

On a reasonably recent System V system, ftw can be very fast.
For example, on my 16MHz 386/ix machine at home I was able to do a
	find / -print > /dev/null
in well under 20 seconds, with a second run producing no disk accesses
at all, as everything was in the cache.
I had over 250 MBytes' worth of data in over 50,000 files, so that is not
too bad (the amount of data being less significant than the number of
files, of course!).


One thing to do is to have a directory daemon -- you give it a directory,
and it returns all of the sub directory and file names marked as such.
This isn't too hard with messages, for example, and has the advantage that
while one process is processing (e.g. printing the file names), the other
can be doing stat() on them.
This might be part of the motivation for the
	find /dir -print | cpio -lots -of -options > /dev/ice
paradigm -- I don't know.

Some database systems (e.g. Oracle) have a read-ahead daemon that fetches
the next block in (for example) a linked list.  In many cases (not sure about
Oracle here) all it needs to do is read it -- this puts it in the Unix
buffer cache for a few seconds, long enough for the database client to
use it without Unix having to re-read it from disk.


The trouble with doing this for find(1)-like program is that it can be hard
to tell how effective it is in "real-life" situations, but there are cases
where it can be a real win.

Finally, if you are really in need of speed, you could consider keeping a
btree of filenames and paths.  You only need to check that the directory
has not altered to determine that it has no new, lost or renamed children,
so you can simply keep a time-since-last-changed.
Now you can do better than one stat per file, because you only have to
check each file once when building the database and each directory (not
file) again later.
I don't know how to make find(1) or ftw(3) much faster than this, and
this at at a considerable cost in complexity.

Lee
-- 
Liam R. Quin, Unixsys (UK) Ltd [note: not an employee of "sq" - a visitor!]
lee@sq.com (Whilst visiting Canada from England, until Christmas)
utai!anduk.uucp!lee (after Christmas)
 ...striving to promote the interproduction of epimorphistic conformability