Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Antonio Grandi) Newsgroups: comp.arch Subject: Re: Translating 64-bit addresses Message-ID: Date: 9 Mar 91 20:51:21 GMT References: <6590@hplabsz.HP.COM> <12030@pt.cs.cmu.edu> <6626@hplabsz.HP.COM> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 168 Nntp-Posting-Host: aberdb In-reply-to: peter@ficc.ferranti.com's message of 7 Mar 91 13:31:29 GMT On 7 Mar 91 13:31:29 GMT, peter@ficc.ferranti.com (Peter da Silva) said: peter> In article peter> pcg@cs.aber.ac.uk (Piercarlo Antonio Grandi) writes: pcg> I would however still maintain that even with conventional multiple pcg> address space architectures shared memory is not necessary, as pcg> sending segments back and forth (remapping) gives much the same pcg> bandwidth. peter> I don't think you can really make a good case for this. Tell that to the people that ported Mach, and 4.3BSD to the PC/RT... :-) My estimate is that remapping can be done on demand (lazy remapping), not on every context switch, and it does not cost more than a page fault (which is admittedly an extremely expensive operation, but one about which people don't complain). Also in many cases lazy remapping costs nothing, given the vagaries of scheduling. Suppose a segment is shared between processes 1 and 3; if 1 is deactivated, 2 is activated, and 1 is reactivated, no remapping need take place, because 3 has not accessed it in this sequence. If 3 had accessed it, the OS would have taken a fault on the attempted access, found out that the segment was mapped to process 1, unmapped it from 1, and remapped it onto 3. I correct myself: far less expensive than a page fault. Probably also less expensive than a process reschedule, even in a properly designed kernel. You lose a lot only if you have a very large number of shared segments, which are shared among a lot but not all processes, and which are all being accessed in every time slice given to each process that shares them. A very, very, very unlikely scenario, and one in which after all the cost is proportional to use, not worse. Incidentally, avoiding the scenario above is why I think that sharing single pages as opposed to sharing segments is a bad idea: if each process sharing a segment of address space touches more the page in it, a remap fault occurs on each page. I think (and some statistics seem to support my hunch) that this multiple page access in the same shared segment is a far more frequent phenomenon than multiple shared segment access. peter> Consider the 80286, where pretty much all memory access for large peter> programs is done by remapping segments. Are you sure? I think that in all OSes that run on the 286 maybe except for iRMX/286 segments are not unmapped and remapped, but stay always mapped, and can be and are shared. peter> Loading a segment register is an expensive operation, Around 20-30 cycles if memory serves me right. Compared to a context switch it is insignificant. And in any case the 286 MMU does support shared segments directly, so there is no need to do segment remapping to simulate shared memory. This said, your comments about the 286 MMU are irrelevant to a discussion on acceptability of the cost of simulating shared memory by remapping them on demand or at a context switch in each process that has them nominally attached. This discussion is important only when comparing reverse map MMUs with straight map MMUs, and only when the reverse map MMU does not support (unlike mine) shared segments, and only when shared segments are deemed useful. Yet in your discussions of the 286 MMU there are some common fallacies and myths, and they merit some comment. Note first that the it is only because of a design misconception (not quite a mistake) of the 286 designers that loading a segment register is so expensive. The problem is that the shadow segment registers are not like TLBs, in that they are reloaded every time, even if the shadowed segment register *value* has not changed. This could have easily been avoided by simply comparing the old and new segment register values. It was not, only because conceivably the segment descriptors could have been altered even if the value of ssegment register had not in fact changed, and the 286 has no distinct "flush shadow segment registers" instruction. I guess that the designers assumed that in their "Pascal"/"Algol" model of process execution each segment register was dedicated to a specific function (code, stack, global, var parameters), and supposed not to be reloaded often, so no need to treat the shadows as caches. peter> and is to a large extent the cause of the abysmal behaviour of peter> large programs on that architecture. For an extreme case, the peter> sieve slows down by a factor of 11 once the array size gets over peter> 64K. This is only because probably the HUGE model gets used, which implies funny code to simulate 32 bit address arithmetic (the HUGE model is so expensive because the mistake of putting the ring number in the middle of a pointer instead of in the most significant bits). On less extreme examples, or if you code the sieve for the LARGE model, the slowdown is around 20-50%, even for extremely pointer intensive operations, in the LARGE model. Your figure of 11 is plainly ridiculous and warped by the machinations of the HUGE model; After all 32 bit pointer dereferences are only about 3 times slower than 16 bit pointer dereferences, so even a program that consisted *only* of them would be only 3 times slower. Note again that this point about 32 bit pointer arithmetic on a 286 has *nothing* to do on the cost of simulating shared memory by remapping when the MMU does not support it directly. peter> My own experience with real codes under Xenix 286 bears this out. Maybe. *My* experience of recompiling large large numbers of Unix nonfloat utilities on a 286 tells me that the average slowdown is around 30%. A 10 Mhz 286 is about the equivalent of a PDP-11/73 (1 "MIPS") in the small model or of a VAX-11/750 (0.7 "MIPS") in the large model, to all practical (nonfloat Unix applications :->) purposes. peter> Think of the 80286 as an extreme case of what you're proposing. I seem to have completely failed to explain myself. The 286 is *irrelevant* to a discussion on shared memory simulation by implicit OS supported or explicit application requested segment remapping (whcih I prefer). peter> I think it's clear from this experience that frequent reloading peter> of segment registers is a bad idea. No, the conclusion is not supported by the 286 example; the 286 is uniquely poor for reloading segment registers because it does not treat shadow segment registers as a cache and because its pointers have an unfortunate format. Properly designed MMUs with properly designed TLBs, even reverse map ones, do segment remapping with small or insignificant cost, not worse than the 286 MMU. Moreover the real overhead lies not in reloading some lines in the MMU or the TLB; it is in taking the remap fault and in searching the appropriate kernel structures to find which (nominally shared) segment to map in that region. peter> After your discussion of the inappropriate use of another peter> technology, networks, I would have expected you'd know better. I am sorry I got myself so badly misunderstood. peter> As for single address space machines, my Amiga 1000's exceptional peter> performance... given the slow clock speed and dated CPU (7.14 peter> MHz 68000)... tends to suggest that avoiding MMU tricks might be peter> a good idea here as well. MMUs are a difficult subject. A lot of vendors have bungled their MMU designs, the OS code that supports them, and the VM policies that drive them. Sun is just *one* of the baddies. That a lot of vendors take many years to get their act together (if ever) on virtual memory does not mean that it is a bad technology; it means that maybe it is too subtle for mere Unix kernel hackers. peter> The Sparcstation 2 is the first UNIX workstation I've seen with peter> as good response time to user actions. It's only a 27 MIPS peter> machine... or approximately 40 times faster. The MIPS-eating sun bogons strike again! :-) The people that did Tripos (Martin Richards!) and Amiga (and those that now maintain them at CBM) seem to be quite another story. I am another Amiga fan :-). Now, if only they could get their act together commercially... (please redirect the ensuing flame war to the appropriate newsgroup :->). -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk