Path: utzoo!attcan!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Sun bogosities, including MMU thrashing Message-ID: Date: 23 Jan 91 18:33:34 GMT References: <5257@auspex.auspex.com> <3956@skye.ed.ac.uk> <5390@auspex.auspex.com> <1991Jan21.225211.17757@gpu.utcs.utoronto.ca> Sender: aro@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 75 Nntp-Posting-Host: odin In-reply-to: dennis@gpu.utcs.utoronto.ca's message of 21 Jan 91 22:52:11 GMT On 21 Jan 91 22:52:11 GMT, dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) said: dennis> In article dennis> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: pcg> [ ... there are good reasons explained by the Unix authors why it pcg> would have been a bad idea to double the block size fromn 512 to pcg> 1024 bytes ... ] dennis> While I'm unwilling to dig through the references to determine dennis> whether Thompson and Ritchie actually said this, if they did I dennis> do think they may have changed their minds about it. The file dennis> system used (on Vaxes) by Version 8 was essentially the V7 file dennis> system with the block size increased to 4096. This is the filesystem designed fot the /tmp partitions. It is described in some BSTJ issue thick with database under UNIX articles. dennis> If memory serves, the stated reason this was done was simply dennis> that it made the file system run 8 times faster under typical dennis> loads. Ah, for /tmp it works -- almost all file accesses on /tmp are sequential, and raising the block size is a stupid but effective way of achieving greater physical clustering and read-ahead, at the expense of latency, when it is known that "typical laods" are about sequential access. The problems come when loads are not "typical". Then performance suffers horribly because the mandatory extra clustering and read ahead not only is not of benefit, it is highly damaging, because of lower buffer cache hit rates, and of extra internal access fragmentation. This is what I use to call "billjoyism", designing something that works well 80% of the time and breaks down badly in the remaining 20%, when a little more hard thinking (some call it "engineering") would find a more flexible solution, like, in the example above, dynamic adaptive clustering (which happens almost by default if one switches to a free list ordered by position instead of time, e.g. a bitmap based one) instead of static predictive clustering like raising the block size. dennis> And I distinctly remember arguments being made at the time to dennis> the effect that the speed of the Berkeley fast file system dennis> (still a fairly recent innovation then) was almost exclusively dennis> due to the larger block size, and that the block clustering dennis> algorithm, which makes the supporting code complex and dennis> relatively CPU-intensive when writing, really was unnecessary. Yes, on that type of machine (stupid disc controllers, timesharing) that's true. The question: is the merit of the improvement because of fixed static clustering or because of its consequences in the case of sequential access? .... The billjoys will say "so what, sequential access is 80%, so we go for it, and damn the rest!". Unfortunatley there are two problems: 1) random access to small files is fairly common in UNIX, and cannot be so easily dismissed: directories, inode pages. Some little dbm, too. 2) a lot of the sequential access preponderance in UNIX is an historical consequence of the fact that it was especially optimized in the original design, and that it is has become even more so with time. Thus sequential access is used also where under a more balanced design random access would be used. For example UNIX editors traditionally copied the file to edit twice sequentially on every edit. Now they load it into memory and write it back from there, which gives most VM subsystems the fits, for other similar reasons. Also, the *contents* of directories are accessed sequentially, when a single B-tree based directory file (Mac, Cedar) or similar could be faster and more compact. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk