Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: Page size and linkers (was: Re: SunMMU history)
Message-ID: <PCG.91Feb8204700@teacho.cs.aber.ac.uk>
Date: 8 Feb 91 20:47:00 GMT
References: <MOSS.91Jan21172107@ibis.cs.umass.edu> <45242@mips.mips.COM>
	<1991Jan27.214522.24408@watdragon.waterloo.edu>
	<1991Jan29.033024.1516@craycos.com> <1991Feb4.190949.1190@HQ.Ileaf.COM>
Sender: cho@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 63
Nntp-Posting-Host: teacho
In-reply-to: md@HQ.Ileaf.COM's message of 4 Feb 91 19:09:49 GMT

On 4 Feb 91 19:09:49 GMT, md@HQ.Ileaf.COM (Mark Dionne x5551) said:

md> I have done some experiments with Interleaf (a large publishing
md> program written in C), and found that for many usage patterns,
md> typically about 25% of the code that is paged-in is actually
md> touched.  This can mean that up to 2 meg of memory is often being
md> "wasted".

Not unsurprising; there are lots of statistics that say that on average
only 200 instructions are executed before a "long jump" is executed;
this means that small page sizes tend to reduce working sets
dramatically. A concise summary of the "armchair" evidence on this is in
Shaw's "Logical design of operating systems", in the section on paging
performance (page 232 in the II edition).

It would be amusing for you to repeat the same measurements on different
machines, e.g. a Gould PN1 with 16KB/32KB pages, a Sun 3 with 8KB pages,
and all the way down to a BSD VAX with 1KB pages. If you do, please send
a report to SIGOPS or SIGARCH! (post it here first :->). A large amount
of statistics already exists on this subject, but they are at times
twenty years old. It would be interesting to see them confirmed with
data on more recent applications/languages.

md> Simply reordering the .o files helped improve things about 10 to
md> 20%, but structured code tends to put boring initialization routines
md> next to workhorses, etc., preventing one from getting a lot of
md> improvement.

I am fond of mentioning another nice idea similar to 'register', and
with the same excellent benefits/costs ratio: I have been told that some
Algol 68 compiler had a 'rarely' (executed) PRAGMA so that the compiler
would generate the relevant code "offline", thus making the most
frequently executed path as streamlined as possible.

md> One thing that would help out here would be a compiler switch that would
md> produce multiple .o files for a single .c file (one .o file per function). 

Actually the better idea is probably to have smart linkers; but, lacking
those, this is not a bad idea. I used to be unhappy with the GNU C++
streams implementations, which was very monolithic, so I wrote an
implementations that had lots of small source files. Space occupied, and
presumably time, decreased dramatically.

md> As someone hinted at, X servers would probably be excellent candidates
md> for this treatment. I've heard rumors that Sun has been doing this.

Actually I think that MIT are doing it. I have been told by somebody
else that most of the time X is executing the same 2KB stretch of
code...  Another terribly bad offender is GNU Emacs. Actually most any
contemporary program. We do have 'time' profilers readily available,
some as sophisticated as pixie and gprof, but I am not aware of *any*
readily available 'locality' profiler. Control flow profilers like the
two mentioned above can be used to optimize placement of functions; MIPS
uses pixie to optimize for caching, for example.


As an overall comment, I find this posting and the earlier one by a
person from NeXT very refreshing: some people *do* understand and care
about engineering and cost/benefit ratios. It is encouraging.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk