Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!rupert!pcg
From: pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.arch
Subject: Re: Multics & Memory mapped files
Message-ID: <PCG.90Feb11205739@rupert.cs.aber.ac.uk>
Date: 11 Feb 90 20:57:39 GMT
References: <8859@portia.Stanford.EDU> <20571@watdragon.waterloo.edu>
	<49956@sgi.sgi.com> <4791@helios.ee.lbl.gov> <2093@crdos1.crd.ge.COM>
	<1990Feb7.221800.804@utzoo.uucp> <2106@crdos1.crd.ge.COM>
	<5180@crdgw1.crd.ge.com>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 86
In-reply-to: hammondr@sunroof.crd.ge.com's message of 9 Feb 90 14:19:19 GMT

In article <5180@crdgw1.crd.ge.com> hammondr@sunroof.crd.ge.com (Richard A Hammond) writes:

   Everybody seems to be missing the crucial fact about memory mapped files!
   They ONLY work for cases where the file size is < virtual address space!!!

Not really. You can work around it elegantly.

Unfortunately there are OTHER problems with memory mapped
entities, that are not being really addressed in the mapped
memory systems that are emerging now and that clone mindlessly
the Multics approach.

Still memory mapped is a *huge* win over the Unix way of doing
things.

The general approach of doing memory mapped things is to assume
that they exist independently from the process address space, and
that you can map sections of the entity in sections of the
address space; in this way an entity may be much larger than an
address space (see how MUSS does it, SP&E August 1979).

You separate address space from data space entirely; a job may
have several data space segments that are not mapped in any
address space window.

If you allow a user program the ability to manipulate its address
space map, this becomes very easy to do; you can even (like in
Mach) have virtual data segments, where the address space fault
handler fakes data instead of mapping in data space entity.

Once you have this, you discover all the problems with memory mapped
entities. One can be easily solved, the other not so easily.

The first is data space aliasing. You have portions of data space
visible thru different ranges of address space, possibly multiply
in the same address space, or multiply in different address
spaces (shared memory). This is bad. The cure is to allow it only
for irrevocably read only segments; the others can only be mapped
in one place at a time; address spaces take turns at mapping a
segment (this is the MUSS approach -- MUSS has convenient
'messages' that pass around permission to map a segment). This
makes for safe interprocess communication, and need not (on a
suitable HW VM architecture) be inefficient at all.

In particular, abolishing shared memory makes it possible to use
reverse map MMUs (like the ATLAS, the MU6, an as yet unpublished
design of mine, and the ROMP), which are a big win because they
efficiently support very large, sparse address spaces.


The second problem is address space aliasing. This is that you
may map the same segment at different times in different portions
of the address space. This means that you cannot use absolute
addresses in a segment (as well as the obvious semantic hazards).
The *only* solution is to have a single address space for all
processes, i.e. a capability machine (if you want protection
:->).

There are palliatives; the early binding (MULTICS) palliative is
to have an impure relocation table for the segment, that gets
copied and absolutized whenever the segment is mapped (static
early binding) or addresses in it are used (dynamic early
binding); the late binding palliative (used in MUSS/MUPL, and in
PL/1) is to have relative pointers as a language feature.

Which of the palliatives you prefer depends strongly on the type
of data segment, and its usage pattern. For example the early
binding approach is commonly used with code segments, as the late
binding one is the same as position independent code, which is
not always optimal. Many data structures can be easily approached
with relative pointers, and they become more compact as well.

Intersegment pointers are especially difficult, and they usually
require the early (which may be static, but more often dynamic)
binding approach.

A very large database can be built of multiple segments, and if
they are properly implemented there is really no limit of size,
as you slide multiple windows over multiple segments. A suitably
written library can make this virtually painless. Note that something
similar is always needed, except on single very large address space
machines.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk