Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.sys.hp
Subject: Re: Disk performance HP-UX 6.5
Message-ID: <1550@aber-cs.UUCP>
Date: 21 Dec 89 20:12:36 GMT
Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Organization: Dept of CS, UCW Aberystwyth
	(Disclaimer: my statements are purely personal)
Lines: 104

In article <2771@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
    >Actaully, what's happening here isn't what you think either.
    
    Nor is it what you think....

Well, apparently you are right, but in a sense this is disappointing; Sun
could have been cleverer, vut has been clever enough.
    
    SunOS does, in fact, do UFS and NFS I/O by something that basically
    amounts to memory mapping.  What a "read" of a UFS or NFS file, or, I
    think, a block special file amounts to is "map the region being read
    into the kernel's address space, and then copy from that mapped region
					      ^^^^
    into the user's buffer".

If they do copy on write, and especially if the buffers are page aligned,
nothing need happen. In particular, the stdio library and even open(2),
read(2), ... are really layered onto memory mapping in SunOS 4. The old Unix
I/O system is nearly dead; open(2) maps the file, and read(2) accesses it
direct.

In other words, the traditional Unix I/O is done only for some character
devices (some use streams instead). SunOS 4 emulates a PDP on a Multics...

    The kernel obviously has no idea that the data in question is going to
    be written to "/dev/null", so it copies it anyway, which means it has to
    fault the data in from the file if it's not already in memory.

Not necessarily true. With read(2) implemented as copy-on-write, the data
pages would never be actually touched, they could as well remain on disc.
This of course would more easily be true if Sun's dd had page aligned
buffers, or if SunOS implemented unaligned copy-on-write (much difficult).

    (Yes, I've read the code; that's how it works.)

This settles it.
    
    "dd", on the other hand, actually does "read"s from its input file (as
    proven by running "trace" on it),

Again, if dd buffers were page aligned (they should be) and copy-on-write
were used, this would not need happen. Unfortunately SunOS does not do
this, so more investigation is needed.

I have got some more data points, that show something interesting, after
some simple tests under both SunOS 3 and 4.

The discs in both cases are broadly comparable; CPU speeds are not very
important here. To do the reads, and be sure that pages were faulted, I
used a trivial program like this:

	main()
	{
		char buf[16*1024];
		/* hope the optimizer does not do funny tricks */
		while (read(0,buf,sizeof buf) > 0)
			buf[1*1024] = buf[9*1024] = 'x';
	}

which reads two Sun pages at a time, and modifies a byte in each to make
sure that copy on write if present is exercised, and faults the page in.
The following figures are quite approximate, but representative:

	SunOS	Mbytes	Type	Seconds	KB/sec	I/Os	Machine

	3	10	block	95	100	5100	Sun 3/50
	4	10	raw	20	500	?	Sun 3/50
	4	24	-	45	500	530	Sun 3/280

What I read here is less optimistic results then some that have been posted,
but impressive nonetheless. Getting over 500KB/s out of a disc is no mean
feat.  SunOS manages to do that both using raw device access under 3 and
either under 4. The fact that SunOS 4 gives with the block device the same
performance as SunOS 3 on the block device means that mapping disc blocks
is effective in not requiring any additional overhead associated with
buffer cache management.

Actually, the interesting column is the "I/Os" column, that tells us how
many I/O operations were scheduled. It is not available for SunOS 3 raw
devices, but it tells us that (and I have other data that confirms this)
that even if we are reading by two pages at a time, SunOS 4 actually fetches
six at a time, that is does heavvy clustering os I/O requests (hopefully
only when it detects sequential access). This looks like a big win,
*for sequential access*.

As to the actual faulting and page copying, apparently it does not matter
a lot, given the CPU speed and the overlapping of I/O operations.

So the result here is that eliminating the buffer cache means that mapped
devices exploit well the available bandwidth, while using buffer
cache and passing thru the strategy function therein reduces effective
bandwidth to 20%.

It would be interesting to see the bandwidth reduction due to the filesystem
under both technologies. If anybody wants to have a go... Remember that
you must unmount and remount a filesystem before each test, to invalidate
any in-core pages.

It would be interesting to see similar number for HP-UX (one of whose
incarnations used to have an extent based filesystem, BTW).
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk