Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Sun bogosities, including MMU thrashing Message-ID: Date: 21 Jan 91 16:03:53 GMT References: <5257@auspex.auspex.com> <3956@skye.ed.ac.uk> <5390@auspex.auspex.com> Sender: aro@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 99 Nntp-Posting-Host: odin In-reply-to: guy@auspex.auspex.com's message of 20 Jan 91 20:13:55 GMT On 20 Jan 91 20:13:55 GMT, guy@auspex.auspex.com (Guy Harris) said: pcg> [ ... raising the default block size is not a clever move ... ] pcg> It also has some big disadvantages, hinted at by Thompson&Ritchie in pcg> the V7 papers (they advised against doubling the block size from 1 to pcg> 2 sectors with terse and cogent reasoning). guy> Which paper was that? One of "Unix Implementation" or "Unix IO system", or "A retrospective". The argument was that a lot of Unix files are very small (directories as a rule); BSD fragments get around the space overhead problem, but not around the buffer cache hit rate problems. Doubling the block size is based on the expectation that each IO transaction will read in or write out twice the number of "useful" bytes as before. For purely sequential access this is approximately true, but the same effect can be achieved more efficiently by using dynamic clustering and extending read ahead to N blocks istead of just 1 like V7, and for random access one loses badly. guy> [ ... noting that a lot of SunOS's questionable design decisions guy> are taken straight off from BSD ... ] Well, as 4.2BSD was being completed, some key OS designer was still working with UCB CSRG as well as working at Sun itself. pcg> For some reason there is still an 'nbuf' variable in the kernel to set pcg> the number of buffer headers. It would be interesting to know what role pcg> have buffer headers in the new architecture. guy> From "SunOS Virtual Memory Implementation", in some EUUG proceedings: Yes, I had read that paper (I referred to "some reason"). The reasons seem weak, given that, for example, the interior of the kernel has been changed quite a bit anyhow. Still, the power of backward compatibility! pcg> Unfortunately, the Sun virtual memory technology has an 8KB page size. guy> At least on Sun-3s, Sun-3xs, and Sun-4s. The "desktop SPARC" machines guy> have a 4KB page size. I don't remember what the page size on the 386i guy> was. (The page size on the Sun-2 was 2KB.) Ah yes. At least 2KB was reasonable, and was a funtion of the Sun-2 having a limited address space. But the choice of caching entire page tables meant that when Sun had to enlarge the virtual address space they had to enlarge the page size as well. I think that there are some good arguments both pro and con the Sun MMU design for the Sun 1; maybe also for the Sun 2; but in the Sun 3 case I would not rate the "pro" arguments as good. The 386i was 4KB, like all 386s, and the SPARCs. Barely tolerable, but still too large: but for IO problems, because of which you want to use dynamic clustering, not bigger page sizes, the smaller is the page size, the better. IMNHO a page size of 1024 bytes is more or less the right one, given prevailing IO technology etc. pcg> This means that under SunOS 4 you effectively no longer have variable pcg> sized buffers against the problem of internal fragmentation. All of pcg> central memory uses only 8KB pages as the allocation unit. We can thus pcg> contemplate the ridiculous situation that internal fragmentation is a pcg> concern for disc space wastage, but not for memory space. guy> Yeah, I thought so too, once; I asked Rob Gingell about it, and he guy> pointed out that SunOS 4.x is no different from 4.2andupBSD in this guy> regard, as noted above. No surprise again :-). More surprising (but maybe not, considering human nature) is that Sun has never corrected the design mistakes inherited from 4.xBSD. It's just like System V.3.2 still having an expansion swap policy which is still there from V7, where it made sense on slow-CPU, fast-IO PDPs, a situation 100% the opposite of current machines. guy> Both of them allocate page-sized chunks for buffers. 4.2andupBSD guy> just happens to have originally been done on a machine with guy> 512-byte physical pages, and used 1024-byte "logical" pages. The guy> 4.2BSD buffer pool code in SunOS 3.x used 8KB pages as the guy> allocation unit on Sun-3s.... Oh yes, and this is the crazy thing. It is yet another nefarious consequence of the botched Sun 3 MMU architecture: that the page size is as large as the disk block size, not as small as the disk fragment size. Shifting the blame to 4.2BSD is too easy; a large page size is a decision that Sun took with the Sun 3 MMU, against all reasonableness and known properties of VM systems. In summary, I agree that a lot of Sun's misdesigns are inherited from BSD (some are even inherited from V7/32V!); that Sun appropriated them enthusiastically, and stuck to them for years and thru several OS release and product generations is an impressive demonstration of something. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk