Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.sys.hp Subject: Re: Disk performance HP-UX 6.5 Message-ID: <1550@aber-cs.UUCP> Date: 21 Dec 89 20:12:36 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 104 In article <2771@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >Actaully, what's happening here isn't what you think either. Nor is it what you think.... Well, apparently you are right, but in a sense this is disappointing; Sun could have been cleverer, vut has been clever enough. SunOS does, in fact, do UFS and NFS I/O by something that basically amounts to memory mapping. What a "read" of a UFS or NFS file, or, I think, a block special file amounts to is "map the region being read into the kernel's address space, and then copy from that mapped region ^^^^ into the user's buffer". If they do copy on write, and especially if the buffers are page aligned, nothing need happen. In particular, the stdio library and even open(2), read(2), ... are really layered onto memory mapping in SunOS 4. The old Unix I/O system is nearly dead; open(2) maps the file, and read(2) accesses it direct. In other words, the traditional Unix I/O is done only for some character devices (some use streams instead). SunOS 4 emulates a PDP on a Multics... The kernel obviously has no idea that the data in question is going to be written to "/dev/null", so it copies it anyway, which means it has to fault the data in from the file if it's not already in memory. Not necessarily true. With read(2) implemented as copy-on-write, the data pages would never be actually touched, they could as well remain on disc. This of course would more easily be true if Sun's dd had page aligned buffers, or if SunOS implemented unaligned copy-on-write (much difficult). (Yes, I've read the code; that's how it works.) This settles it. "dd", on the other hand, actually does "read"s from its input file (as proven by running "trace" on it), Again, if dd buffers were page aligned (they should be) and copy-on-write were used, this would not need happen. Unfortunately SunOS does not do this, so more investigation is needed. I have got some more data points, that show something interesting, after some simple tests under both SunOS 3 and 4. The discs in both cases are broadly comparable; CPU speeds are not very important here. To do the reads, and be sure that pages were faulted, I used a trivial program like this: main() { char buf[16*1024]; /* hope the optimizer does not do funny tricks */ while (read(0,buf,sizeof buf) > 0) buf[1*1024] = buf[9*1024] = 'x'; } which reads two Sun pages at a time, and modifies a byte in each to make sure that copy on write if present is exercised, and faults the page in. The following figures are quite approximate, but representative: SunOS Mbytes Type Seconds KB/sec I/Os Machine 3 10 block 95 100 5100 Sun 3/50 4 10 raw 20 500 ? Sun 3/50 4 24 - 45 500 530 Sun 3/280 What I read here is less optimistic results then some that have been posted, but impressive nonetheless. Getting over 500KB/s out of a disc is no mean feat. SunOS manages to do that both using raw device access under 3 and either under 4. The fact that SunOS 4 gives with the block device the same performance as SunOS 3 on the block device means that mapping disc blocks is effective in not requiring any additional overhead associated with buffer cache management. Actually, the interesting column is the "I/Os" column, that tells us how many I/O operations were scheduled. It is not available for SunOS 3 raw devices, but it tells us that (and I have other data that confirms this) that even if we are reading by two pages at a time, SunOS 4 actually fetches six at a time, that is does heavvy clustering os I/O requests (hopefully only when it detects sequential access). This looks like a big win, *for sequential access*. As to the actual faulting and page copying, apparently it does not matter a lot, given the CPU speed and the overlapping of I/O operations. So the result here is that eliminating the buffer cache means that mapped devices exploit well the available bandwidth, while using buffer cache and passing thru the strategy function therein reduces effective bandwidth to 20%. It would be interesting to see the bandwidth reduction due to the filesystem under both technologies. If anybody wants to have a go... Remember that you must unmount and remount a filesystem before each test, to invalidate any in-core pages. It would be interesting to see similar number for HP-UX (one of whose incarnations used to have an extent based filesystem, BTW). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk