Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!rupert!pcg
From: pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.os.mach
Subject: Re: Mach performance? [Long]
Message-ID: <PCG.89Dec23140426@rupert.cs.aber.ac.uk>
Date: 23 Dec 89 14:04:26 GMT
References: <cZYIzoW00jdX4CVWFY@cs.cmu.edu> <QZYJhvu00hYP4Qa1N2@cs.cmu.edu>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 142
In-reply-to: Richard.Draves@CS.CMU.EDU's message of 21 Dec 89 22:38:51 GMT

In article <QZYJhvu00hYP4Qa1N2@cs.cmu.edu> Richard.Draves@CS.CMU.EDU writes:

   Excerpts from mail: 21-Dec-89 Re: Mach performance? [Long]
   Rick.Rashid@CS.CMU.EDU (1425)

   > Actually, Rich is only partly correct.

   I didn't mention the "short-circuit" path in my brief description of
   remote IPC because I don't think it is usable.  It is an experimental
   option.  I don't even know if the code would still compile if one turned
   on the option.  It certainly isn't in use anywhere.  NeXT tried to put
   the short-circuit code into their production system and found it was too
   buggy to use; they had to back it out.

I have been aware of Rashid's Accent IPC for almost ten years,
and about five years ago ported it to to System V, and did a
netmsgserver.  It is quite possible and actually I think (I did
not actually do it) fairly easy to move the netmsgserver in the
kernel, so that context switch time is essentially nullified.

This is what 4.xBSD essentially does; by default their
netmsgserver is stuck right in the kernel. Not many seem to have
noticed that this actually is not the only option you have under
4.xBSD, indeed there are two alternatives:

1) all user programs could use the Unix domain, where you can
(modulo some bugs) send filedescriptors between processes.  All
processes open Unix domain connections to a network server
process, and this is the only one that opens TCP/IP or whatever
connections to the outside world. You can use the existing TCP
sockets, or use raw IP sockets and reimplement TCP in the server,
or whatever. This is exactly like in Mach.

2) an unimplemented feature of 4.xBSD is user implemented IPC
domains.  The idea was to give user processes the ability to
register with the kernel as servers for sockets of some
particular domain, and the kernel would pass to that process all
operations on sockets of that domain. This facility has never
been implemented, just like 4.xBSD wrappers (it actually is
related to them).

Interestingly enough, option 1) is possible also under streams,
and I am quite sure that the two crucial points of Rashid's IPC,
the ability to send file/port descriptors with messages and the access
to global addresses only through address space local file/port
descriptors has been inspired in both cases by Accent (even if
both points are circumvented under 4.xBSD by direct access to a
kernel based netmsgserver).

   I think the short-circuit code was a successful experiment.  The
   improved times it produced confirmed that the netmsgserver is a
   bottleneck in remote Mach IPC.

Naturally all this netmsg server trouble happens because of a
fundamental limitation of Accent/Mach, the inabilities for
threads to change address space and, possibly, to have multiple
address spaces mapped together (yes, I know about sharing address
spaces, it's not quite the same thing). I suspect that these
limitations are there also possibly because otherwise the
architecture would be very different from the Unix one, and CMU
have been badly burnt with Accent that was too unlike Unix.

Context switching for RPC implies three distinct overheads:
security checking, address space switching, thread switching.

There are therefore three possible levels of extra sophistication
beyond Accent/Mach (which is already two levels beyond 4.xBSD):

1) If it were possible for threads to jump between address
spaces, the thread switching overhead would be nullified.

2) If it were possible to map multiple address spaces together
even address space switching would be nullified.

3) If it were possible to inform the OS that an address space trusted
another, security checking in that direction would be eliminated.

As an historical note, Multics had all three for communication
between *rings* in the same address space, and even had support
in hardware to do 3) in the reverse direction. Capability
machines with a single global address space for all threads are
best of course, and since security checking is automagically done
in both directions by hardware, point 3) is moot.

An OS called Psyche (from Rochester, not by chance) allows you to
do all three things on fairly conventional (non-capability,
non-ring) hardware; you can then have your netmsgserver comapped
with your user address space, the thread that wants to do the
network RPC just jumps to the netmsgserver code, and since you
say that you trust it, half of the checking is eliminated as
well.

I think that this should give excellent performance; I have had
correspondence with other people working on similar lines (e.g.
from AT&T), and from our limited data it is apparent you do not
pay much more than for a local procedure call (and maybe even
less than an intra address space thread rendezvouz, as you don't
have synch and thread switch costs).

I have been working (since 1983 on and off... but I have now
apparently found a way to switch to full time for this) on
something that does 1) by default, but will only do 2) for
selected, statically configured, modules, and 3) only for the
kernel. While this is a less general mechanism than Psyche, I
think that the Psyche mechanism is excessively fine grained for
my tastes, and I'd rather be more restrictive, and not even offer
the option to do 2) and 3) in a general way.

There is of course a difference in perspective: I am a
minimalist, and I don't want to add mechanisms that are not
relevant, or may even encourage programming styles at variance
with my target environment, distributed systems, where it is
important to cut overheads, but also to encourage the programmer
to be aware of communication boundaries, and not to expect to be
able to map address spaces together, as they may be on different
machines. It is *possible* to support transparently distributed
shared memory, indeed it is in principle possible and fairly easy
with the existing Accent/Mach architecture, but I think that
hiding communication boundaries while attractive from a
conceptual point of view would also hide the underlying reality
in terms of cost and reliability. Also, most current machines do
not have long enough virtual addresses that you can expect to map
many user address spaces together.

The Psyche people have of course a completely different attitude,
and I would dare say that to them points 2) and especially 3) are
the most important because their target is a NUMA machine, that
is a not-too-loosely coupled multiprocessor and not a (possibly
over a wide area) distributed system (Mach seems ever more
oriented to very closely coupled multiprocessors), and where
hardware effort has been expended to provide some credible,
efficient illusion of global shared memory, that should be
exploited.

In other words, my reckoning is that while in current Accent/Mach
kernels there is a "short-circuit" path as an exception to the
normal mechanism, the (sw) architecture should be such that such
a thing is actually the standard.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk