Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!tut.cis.ohio-state.edu!ucbvax!dog.ee.lbl.gov!elf.ee.lbl.gov!torek
From: torek@elf.ee.lbl.gov (Chris Torek)
Newsgroups: comp.unix.wizards
Subject: Re: Why is restore so slow?
Message-ID: <10003@dog.ee.lbl.gov>
Date: 18 Feb 91 10:29:21 GMT
References: <50235@olivea.atc.olivetti.com> <2880@redstar.cs.qmw.ac.uk>
Reply-To: torek@elf.ee.lbl.gov (Chris Torek)
Organization: Lawrence Berkeley Laboratory, Berkeley
Lines: 47
X-Local-Date: Mon, 18 Feb 91 02:29:22 PST

In article <2880@redstar.cs.qmw.ac.uk> liam@cs.qmw.ac.uk (William Roberts)
writes:
>Restore suffers from the fact that files are stored in inode-number order: 
>this is not the ideal order for createing files as it thrashes the namei-cache 
>because the files are recreated randomly all over the place.

Well, no and yes.

While the files are indeed in inode order, and the restore program (as
opposed to the old `restor' program) does recreate them in this order,
the Fast File System tends to set things up so that all the files in
any one directory are in the same cylinder group as that directory.
Depending on cylinder group sizes this may or may not overload the name
cache, since only the directory parts of the names are cached (each
trailing name is unique within its directory, but the directory must be
searched anyway to verify this first).

More important are two other facts:

 - Each directory must be scanned entirely (to make sure the name is unique);
 - Directory operations are synchronous.

The latter is usually the performance-killer since the directory blocks
tend to remain in the buffer cache.  Directory writes are done
synchronously to make crash recovery possible.  Ordered (but otherwise
delayed) writes should give the same effect with a much smaller
performance penalty; this is being investigated.

>/usr/spool/news/comp/unix/internals/5342 and this took an incredibly long time 
>to restore. /usr/mail contains several hundred files but no subdirectories and 
>restored in about the same sort of time as it took to dump. 

The presence or absence of subdirectories is largely irrelevant: the
problem is the large number of files.  One big file restores much
faster than several dozen small files, even though both take the same
amount of space, because one big file equals one synchronous directory
write (preceded by one synchronous inode write) followed by many
asynchronous data writes.

If you do many full file system restores, it would probably be worth
your effort to make a kernel that does delayed writes for inode and
directory operations, and run it (or enable delayed writes on each file
system in question) each time you do such a restore.  If the system
crashes, you can just start over.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov