Path: utzoo!attcan!uunet!snorkelwacker!usc!cs.utexas.edu!sun-barr!newstop!sun!opus!gingell
From: gingell%opus@Sun.COM (Rob Gingell)
Newsgroups: comp.arch
Subject: mmap() vs. read() (Was: Re: the Multics from the black lagoon :-))
Message-ID: <131682@sun.Eng.Sun.COM>
Date: 12 Feb 90 17:35:13 GMT
References: <8859@portia.Stanford.EDU> <20571@watdragon.waterloo.edu> <1990Feb12.053616.11455@Solbourne.COM> <3556@rti.UUCP> <10468@alice.UUCP>
Sender: news@sun.Eng.Sun.COM
Reply-To: gingell@sun.UUCP (Rob Gingell)
Organization: Sun Microsystems, Mountain View
Lines: 122

In article <3556@rti.UUCP> trt@rti.UUCP (Thomas Truscott) writes:
>[someone wrote a replacement for "sum", called "fastsum" that uses mmap().]
>
>You are comparing your efficient "fastsum" that happens to use mmap()
>against a sluggardly "sum" that happens to use read().
>(Actually it uses getchar(), which calls _filbuf(),
>maybe _filbuf() uses mmap()?!)

As it happens, no.  This is always a potential change, however we have
not done so because to date we have not found that stdio would benefit
from such a change -- the principal advantage would be to save buffer
copy time and memory loading, however we haven't found a large population
of programs where these factors are dominant.  Perhaps it is because our
stdio is so otherwise inefficient, perhaps it is because the applications
themselves are inherently not I/O buffer copy limited, or perhaps simply
because those programs that were already so limited long ago converted
to direct read()/write() operations.

>The following would be a more appropriate test:
>Change your fastsum routine so that instead of mmap()ing
>a megabyte at a time, it does a read() of a megabyte at a time. 
>Compare the mmap() and read() versions of this program.
>I suspect you will find they take about the same amount of time.

I don't think so.  At the very least, the read() version will be slower
than the mmap() version by the amount of time required to effect the
copies from kernel to program buffers.  And this assumes an "optimum"
situation in which the overhead of buffer management in the kernel does
not become significant -- which it will for a large amount of data.  And
it ignores the system effects of essentially doubling the memory load
on the system for both the original file pages and the pages used to
buffer the copies in the application.

>On a Sparcstation 1, try timing "cp" vs. the following program:
>
>    main()
>    {
>	    char bfr[8192];
>	    register int n;
>
>	    while ((n = read(0, bfr, sizeof(bfr))) > 0)
>		    write(1, bfr, n);
>    }
>
>I did "/bin/time cp /vmunix /tmp/x"
>and "/bin/time a.out < /vmunix /tmp/x" several times.
>The results were essentially identical.
>(I did not experiment with buffer sizes, I suspect 16k would be faster.)

I'd be astonished if the results did not always show that access through
mmap() is faster (and they are for this program running on my 3/160.)  To be a
valid experiment, you should be sure that both /vmunix and /tmp/x are
completely flushed from memory after each test run -- otherwise the system's
buffering of the two files will skew the results.  I've never observed a
proper experiment in which mmap() was not faster, though the difference is not
always large.

>There is no inherent reason that read() should be slower
>than mmap() for sequential I/O, since read() is doing precisely
>what is wanted.  Indeed read() should be faster since
>it is conceptually simpler.

Not true.  read() operates by mmaping the file and copying it.  And, due to
limitations in the address space available inside the kernel, read() must
often perform more, smaller "mmap()-like" chunk operations than a single
application mmap() could use, using even more CPU time in the process.

>Note that read() can be implemented with memory mapping, in some cases:
>it could map the address of "bfr" to a copy-on-modify kernel page.

This is also not true, though it is a common belief and one that arose
repeatedly during development.  read() gives you a copy of the file data
at the time that the call is executed.  That copy is immutable save any
action performed by your program.  If read() were implemented *as* mmap(),
then while it is possible to deal with side effects introduced in *your*
machine, it is not, in general, possible to deal with side effects introduced
in other machines -- such as file modifications performed by DOS PC's living
in your network.  It might be possible to make such an assumption save for
heterogeneous environments.  However, it should be noted that neither
MULTICS nor TENEX/TOPS-20 (the latter being the more direct parent of
mmap(), with MULTICS as a more remote ancestor) attempted such an 
optimization either.

>As others have pointed out, read() and write() are generally useful
>on streams, and mmap() is not.
>(The SunOS "cp" command falls back to read/write if mmap() fails.
>But since read/write is as fast as mmap(),
>why bother with mmap() in the first place?!)
>
>So what is mmap() good for?  Plenty.
>But it is NOT a replacement for read/write.

Nor is it advertised as such.  Though Mr.  Truscott has not done so, those
deprecating mmap() for not being "device independent" or lacking other
attributes of read()/write() miss the point -- which was never that mmap()
replace read() or write() or otherwise represent some "grail" in the search
for computing enlightenment.  Rather it was to provide an abstraction of
operations in which the system was already engaged (namely file buffering and
physical store multiplexing) in a way that was accessible to applications and
which can increase their flexibility.  A good test of the sufficiency of such
an abstraction is that it is capable of becoming a primitive which you can use
to replace older and various implementations with a common framework -- and in
this we believe mmap() to have been a success.  We also believe it to be an
effective abstraction for those requiring its properties.  But neither do we
believe that everyone does, for mmap() is certainly a "lower-level"
abstraction than read()/write(), a primitive out of which the latter can be
constructed on memory objects in the same way device drivers provide a
primitive for transfer operations.

Because mmap() is *more* primitive than read()/write(), it can be (as Dennis
Ritchie points out) more cumbersome to use than the equivalent sequence of
read() or write() -- but so would access to raw devices.  If you're
programming around it, it's probably an indication that operating at this
level of the system isn't suitable for your needs, you should use the higher
abstractions.  The fact that the system supplies an abstraction that isn't
suitable for your use, does not lessen the fact that it is an effective
abstraction for others as well as an effective one for the system to use
in the implementation of abstractions that *are* appropriate for your use.
It's been my experience that most frustrations in the use of memory mapping
techniques in MULTICS, TENEX/TOPS-20, and now with mmap() have come from the
expectation that somehow mmap() was a higher-level operation than it really
is.