Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Antonio Grandi)
Newsgroups: comp.arch
Subject: Re: Translating 64-bit addresses
Message-ID: <PCG.91Mar11144147@aberdb.cs.aber.ac.uk>
Date: 11 Mar 91 14:41:47 GMT
References: <6590@hplabsz.HP.COM> <12030@pt.cs.cmu.edu> <6626@hplabsz.HP.COM>
	<PCG.91Mar9205121@aberdb.cs.aber.ac.uk> <92-9BOB@xds13.ferranti.com>
Sender: aro@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 79
Nntp-Posting-Host: aberdb
In-reply-to: peter@ficc.ferranti.com's message of 10 Mar 91 17:50:11 GMT

On 10 Mar 91 17:50:11 GMT, peter@ficc.ferranti.com (peter da silva) said:

peter> Hasn't the PC/RT been found to have surprisingly poor performance
peter> once the number of context switches involved get too high?

I don't know, but this could be for many other reasons. I remember
having seen hints that the RS/6000 does badly on context switching, but
whether this is due to shared memory simulation or rather one of a
million porbable bogosities in the OS I cannot know.

peter> Once you want to access more than 256K (64K for each of DS, SS,
peter> CS, and ES) you *have* to reload the segment registers. The
peter> machine can *not* directly address more than 64K per segment, and
peter> it only has the 4 segment registers.  This is a hard limit unless
peter> you start reloading segment registers... which is sufficiently
peter> expensive to have an exquisitely painful impact on performance.

Maybe you have tired of reading my article before its end, but I
maintain that the 286 large model, even in pointer expensive programs,
has at most a 50% average slow down compared with small model, except
for pathological cases. Such pathological cases are easy to find for
every cache organization, as you will readily concede. Accessing two
arrays that happen to map to the same cache lines kills almost every
machine out there, for one thing...

That the shadow register organization of the 286 is misguided I have
been ready to concede, but it should not reflect on a judgement on the
merits of shared memory simulation via remapping for reverse MMUs, or on
the merits of segmented architectures in general.

I also have the impression that you loathe so much the 286 two
dimensional addressing scheme that you also detest all segmentation
schemes but the two issues are unrelated. Most paged and segmented VM
systems have linear addressing, e.g. the 370, or the VAX-11, and so on.

peter> Loading a segment register is an expensive operation,

pcg> Around 20-30 cycles if memory serves me right. Compared to a
pcg> context switch it is insignificant.

peter> But it happens so much more often.

Dereferencing a far pointer costs only three times a near pointer, and
not every instruction is a far pointer dereference.

Also, when one does segment remapping, really one twiddles the contents
of a field in the LDT (the page table), not that of the segment
registers, and at most once per context switch (and this does not happne
on most context switches). The cost of reloading a segment register and
of remapping a segment are therefore totally unrelated.

peter> Well, only that in the case you're talking about the cost of
peter> remapping the segments is even higher.

True... But not tragic. Taking a trap, finding out which segment should
be remapped, fiddling the LDT of the process who had the segment mapped,
and remapping it might cost as much maybe as reading a block off the
buffer cache, i.e. a few hundred instructions.

I would think that it is of the order of a page fault (mind you, I was
maybe not clear before: just the CPU cost of a page fault, not the many
milliseconds for the IO time possibly associated to it), and less
frequent.

I remember that the BSD VM subsystem that used a like technique to
simulate a 'referenced' bit for each page (take a fault and map the page
in) cost less than 5% on a VAX-11/780, and that was for much more
frequent faulting.

People do IPC using pipes or System V MSGs or sockets which cost far far
far more.

On machines like the 286 that can share segments simultaneously, pure
shared memory is OK. On those that cannot, like the RT, the cost is not
excessive, and probably inferior to that of most alternatives.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk