Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!rupert!pcg From: pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.arch Subject: Re: Multics & Memory mapped files Message-ID: Date: 11 Feb 90 20:57:39 GMT References: <8859@portia.Stanford.EDU> <20571@watdragon.waterloo.edu> <49956@sgi.sgi.com> <4791@helios.ee.lbl.gov> <2093@crdos1.crd.ge.COM> <1990Feb7.221800.804@utzoo.uucp> <2106@crdos1.crd.ge.COM> <5180@crdgw1.crd.ge.com> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 86 In-reply-to: hammondr@sunroof.crd.ge.com's message of 9 Feb 90 14:19:19 GMT In article <5180@crdgw1.crd.ge.com> hammondr@sunroof.crd.ge.com (Richard A Hammond) writes: Everybody seems to be missing the crucial fact about memory mapped files! They ONLY work for cases where the file size is < virtual address space!!! Not really. You can work around it elegantly. Unfortunately there are OTHER problems with memory mapped entities, that are not being really addressed in the mapped memory systems that are emerging now and that clone mindlessly the Multics approach. Still memory mapped is a *huge* win over the Unix way of doing things. The general approach of doing memory mapped things is to assume that they exist independently from the process address space, and that you can map sections of the entity in sections of the address space; in this way an entity may be much larger than an address space (see how MUSS does it, SP&E August 1979). You separate address space from data space entirely; a job may have several data space segments that are not mapped in any address space window. If you allow a user program the ability to manipulate its address space map, this becomes very easy to do; you can even (like in Mach) have virtual data segments, where the address space fault handler fakes data instead of mapping in data space entity. Once you have this, you discover all the problems with memory mapped entities. One can be easily solved, the other not so easily. The first is data space aliasing. You have portions of data space visible thru different ranges of address space, possibly multiply in the same address space, or multiply in different address spaces (shared memory). This is bad. The cure is to allow it only for irrevocably read only segments; the others can only be mapped in one place at a time; address spaces take turns at mapping a segment (this is the MUSS approach -- MUSS has convenient 'messages' that pass around permission to map a segment). This makes for safe interprocess communication, and need not (on a suitable HW VM architecture) be inefficient at all. In particular, abolishing shared memory makes it possible to use reverse map MMUs (like the ATLAS, the MU6, an as yet unpublished design of mine, and the ROMP), which are a big win because they efficiently support very large, sparse address spaces. The second problem is address space aliasing. This is that you may map the same segment at different times in different portions of the address space. This means that you cannot use absolute addresses in a segment (as well as the obvious semantic hazards). The *only* solution is to have a single address space for all processes, i.e. a capability machine (if you want protection :->). There are palliatives; the early binding (MULTICS) palliative is to have an impure relocation table for the segment, that gets copied and absolutized whenever the segment is mapped (static early binding) or addresses in it are used (dynamic early binding); the late binding palliative (used in MUSS/MUPL, and in PL/1) is to have relative pointers as a language feature. Which of the palliatives you prefer depends strongly on the type of data segment, and its usage pattern. For example the early binding approach is commonly used with code segments, as the late binding one is the same as position independent code, which is not always optimal. Many data structures can be easily approached with relative pointers, and they become more compact as well. Intersegment pointers are especially difficult, and they usually require the early (which may be static, but more often dynamic) binding approach. A very large database can be built of multiple segments, and if they are properly implemented there is really no limit of size, as you slide multiple windows over multiple segments. A suitably written library can make this virtually painless. Note that something similar is always needed, except on single very large address space machines. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk