Path: utzoo!attcan!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: Sun bogosities, including MMU thrashing
Message-ID: <PCG.91Jan23183334@odin.cs.aber.ac.uk>
Date: 23 Jan 91 18:33:34 GMT
References: <5257@auspex.auspex.com> <3956@skye.ed.ac.uk>
	<PCG.91Jan18142616@teachk.cs.aber.ac.uk> <5390@auspex.auspex.com>
	<PCG.91Jan21160353@odin.cs.aber.ac.uk>
	<1991Jan21.225211.17757@gpu.utcs.utoronto.ca>
Sender: aro@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 75
Nntp-Posting-Host: odin
In-reply-to: dennis@gpu.utcs.utoronto.ca's message of 21 Jan 91 22:52:11 GMT

On 21 Jan 91 22:52:11 GMT, dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) said:

dennis> In article <PCG.91Jan21160353@odin.cs.aber.ac.uk>
dennis> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> [ ... there are good reasons explained by the Unix authors why it
pcg> would have been a bad idea to double the block size fromn 512 to
pcg> 1024 bytes ... ]

dennis> While I'm unwilling to dig through the references to determine
dennis> whether Thompson and Ritchie actually said this, if they did I
dennis> do think they may have changed their minds about it.  The file
dennis> system used (on Vaxes) by Version 8 was essentially the V7 file
dennis> system with the block size increased to 4096.

This is the filesystem designed fot the /tmp partitions. It is described
in some BSTJ issue thick with database under UNIX articles.

dennis> If memory serves, the stated reason this was done was simply
dennis> that it made the file system run 8 times faster under typical
dennis> loads.

Ah, for /tmp it works -- almost all file accesses on /tmp are
sequential, and raising the block size is a stupid but effective way of
achieving greater physical clustering and read-ahead, at the expense of
latency, when it is known that "typical laods" are about sequential access.

The problems come when loads are not "typical". Then performance suffers
horribly because the mandatory extra clustering and read ahead not only
is not of benefit, it is highly damaging, because of lower buffer cache
hit rates, and of extra internal access fragmentation.

This is what I use to call "billjoyism", designing something that works
well 80% of the time and breaks down badly in the remaining 20%, when a
little more hard thinking (some call it "engineering") would find a more
flexible solution, like, in the example above, dynamic adaptive
clustering (which happens almost by default if one switches to a free
list ordered by position instead of time, e.g. a bitmap based one)
instead of static predictive clustering like raising the block size.

dennis> And I distinctly remember arguments being made at the time to
dennis> the effect that the speed of the Berkeley fast file system
dennis> (still a fairly recent innovation then) was almost exclusively
dennis> due to the larger block size, and that the block clustering
dennis> algorithm, which makes the supporting code complex and
dennis> relatively CPU-intensive when writing, really was unnecessary.

Yes, on that type of machine (stupid disc controllers, timesharing)
that's true. The question: is the merit of the improvement because of
fixed static clustering or because of its consequences in the case of
sequential access? ....


The billjoys will say "so what, sequential access is 80%, so we go for
it, and damn the rest!". Unfortunatley there are two problems:

1) random access to small files is fairly common in UNIX, and cannot be
so easily dismissed: directories, inode pages. Some little dbm, too.

2) a lot of the sequential access preponderance in UNIX is an historical
consequence of the fact that it was especially optimized in the original
design, and that it is has become even more so with time.


Thus sequential access is used also where under a more balanced design
random access would be used. For example UNIX editors traditionally
copied the file to edit twice sequentially on every edit. Now they load
it into memory and write it back from there, which gives most VM
subsystems the fits, for other similar reasons. Also, the *contents* of
directories are accessed sequentially, when a single B-tree based
directory file (Mac, Cedar) or similar could be faster and more compact.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk