Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Page size and linkers (was: Re: SunMMU history) Message-ID: Date: 8 Feb 91 20:47:00 GMT References: <45242@mips.mips.COM> <1991Jan27.214522.24408@watdragon.waterloo.edu> <1991Jan29.033024.1516@craycos.com> <1991Feb4.190949.1190@HQ.Ileaf.COM> Sender: cho@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 63 Nntp-Posting-Host: teacho In-reply-to: md@HQ.Ileaf.COM's message of 4 Feb 91 19:09:49 GMT On 4 Feb 91 19:09:49 GMT, md@HQ.Ileaf.COM (Mark Dionne x5551) said: md> I have done some experiments with Interleaf (a large publishing md> program written in C), and found that for many usage patterns, md> typically about 25% of the code that is paged-in is actually md> touched. This can mean that up to 2 meg of memory is often being md> "wasted". Not unsurprising; there are lots of statistics that say that on average only 200 instructions are executed before a "long jump" is executed; this means that small page sizes tend to reduce working sets dramatically. A concise summary of the "armchair" evidence on this is in Shaw's "Logical design of operating systems", in the section on paging performance (page 232 in the II edition). It would be amusing for you to repeat the same measurements on different machines, e.g. a Gould PN1 with 16KB/32KB pages, a Sun 3 with 8KB pages, and all the way down to a BSD VAX with 1KB pages. If you do, please send a report to SIGOPS or SIGARCH! (post it here first :->). A large amount of statistics already exists on this subject, but they are at times twenty years old. It would be interesting to see them confirmed with data on more recent applications/languages. md> Simply reordering the .o files helped improve things about 10 to md> 20%, but structured code tends to put boring initialization routines md> next to workhorses, etc., preventing one from getting a lot of md> improvement. I am fond of mentioning another nice idea similar to 'register', and with the same excellent benefits/costs ratio: I have been told that some Algol 68 compiler had a 'rarely' (executed) PRAGMA so that the compiler would generate the relevant code "offline", thus making the most frequently executed path as streamlined as possible. md> One thing that would help out here would be a compiler switch that would md> produce multiple .o files for a single .c file (one .o file per function). Actually the better idea is probably to have smart linkers; but, lacking those, this is not a bad idea. I used to be unhappy with the GNU C++ streams implementations, which was very monolithic, so I wrote an implementations that had lots of small source files. Space occupied, and presumably time, decreased dramatically. md> As someone hinted at, X servers would probably be excellent candidates md> for this treatment. I've heard rumors that Sun has been doing this. Actually I think that MIT are doing it. I have been told by somebody else that most of the time X is executing the same 2KB stretch of code... Another terribly bad offender is GNU Emacs. Actually most any contemporary program. We do have 'time' profilers readily available, some as sophisticated as pixie and gprof, but I am not aware of *any* readily available 'locality' profiler. Control flow profilers like the two mentioned above can be used to optimize placement of functions; MIPS uses pixie to optimize for caching, for example. As an overall comment, I find this posting and the earlier one by a person from NeXT very refreshing: some people *do* understand and care about engineering and cost/benefit ratios. It is encouraging. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk