Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!rupert!pcg From: pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.os.mach Subject: Re: Mach performance? [Long] Message-ID: Date: 23 Dec 89 14:04:26 GMT References: Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 142 In-reply-to: Richard.Draves@CS.CMU.EDU's message of 21 Dec 89 22:38:51 GMT In article Richard.Draves@CS.CMU.EDU writes: Excerpts from mail: 21-Dec-89 Re: Mach performance? [Long] Rick.Rashid@CS.CMU.EDU (1425) > Actually, Rich is only partly correct. I didn't mention the "short-circuit" path in my brief description of remote IPC because I don't think it is usable. It is an experimental option. I don't even know if the code would still compile if one turned on the option. It certainly isn't in use anywhere. NeXT tried to put the short-circuit code into their production system and found it was too buggy to use; they had to back it out. I have been aware of Rashid's Accent IPC for almost ten years, and about five years ago ported it to to System V, and did a netmsgserver. It is quite possible and actually I think (I did not actually do it) fairly easy to move the netmsgserver in the kernel, so that context switch time is essentially nullified. This is what 4.xBSD essentially does; by default their netmsgserver is stuck right in the kernel. Not many seem to have noticed that this actually is not the only option you have under 4.xBSD, indeed there are two alternatives: 1) all user programs could use the Unix domain, where you can (modulo some bugs) send filedescriptors between processes. All processes open Unix domain connections to a network server process, and this is the only one that opens TCP/IP or whatever connections to the outside world. You can use the existing TCP sockets, or use raw IP sockets and reimplement TCP in the server, or whatever. This is exactly like in Mach. 2) an unimplemented feature of 4.xBSD is user implemented IPC domains. The idea was to give user processes the ability to register with the kernel as servers for sockets of some particular domain, and the kernel would pass to that process all operations on sockets of that domain. This facility has never been implemented, just like 4.xBSD wrappers (it actually is related to them). Interestingly enough, option 1) is possible also under streams, and I am quite sure that the two crucial points of Rashid's IPC, the ability to send file/port descriptors with messages and the access to global addresses only through address space local file/port descriptors has been inspired in both cases by Accent (even if both points are circumvented under 4.xBSD by direct access to a kernel based netmsgserver). I think the short-circuit code was a successful experiment. The improved times it produced confirmed that the netmsgserver is a bottleneck in remote Mach IPC. Naturally all this netmsg server trouble happens because of a fundamental limitation of Accent/Mach, the inabilities for threads to change address space and, possibly, to have multiple address spaces mapped together (yes, I know about sharing address spaces, it's not quite the same thing). I suspect that these limitations are there also possibly because otherwise the architecture would be very different from the Unix one, and CMU have been badly burnt with Accent that was too unlike Unix. Context switching for RPC implies three distinct overheads: security checking, address space switching, thread switching. There are therefore three possible levels of extra sophistication beyond Accent/Mach (which is already two levels beyond 4.xBSD): 1) If it were possible for threads to jump between address spaces, the thread switching overhead would be nullified. 2) If it were possible to map multiple address spaces together even address space switching would be nullified. 3) If it were possible to inform the OS that an address space trusted another, security checking in that direction would be eliminated. As an historical note, Multics had all three for communication between *rings* in the same address space, and even had support in hardware to do 3) in the reverse direction. Capability machines with a single global address space for all threads are best of course, and since security checking is automagically done in both directions by hardware, point 3) is moot. An OS called Psyche (from Rochester, not by chance) allows you to do all three things on fairly conventional (non-capability, non-ring) hardware; you can then have your netmsgserver comapped with your user address space, the thread that wants to do the network RPC just jumps to the netmsgserver code, and since you say that you trust it, half of the checking is eliminated as well. I think that this should give excellent performance; I have had correspondence with other people working on similar lines (e.g. from AT&T), and from our limited data it is apparent you do not pay much more than for a local procedure call (and maybe even less than an intra address space thread rendezvouz, as you don't have synch and thread switch costs). I have been working (since 1983 on and off... but I have now apparently found a way to switch to full time for this) on something that does 1) by default, but will only do 2) for selected, statically configured, modules, and 3) only for the kernel. While this is a less general mechanism than Psyche, I think that the Psyche mechanism is excessively fine grained for my tastes, and I'd rather be more restrictive, and not even offer the option to do 2) and 3) in a general way. There is of course a difference in perspective: I am a minimalist, and I don't want to add mechanisms that are not relevant, or may even encourage programming styles at variance with my target environment, distributed systems, where it is important to cut overheads, but also to encourage the programmer to be aware of communication boundaries, and not to expect to be able to map address spaces together, as they may be on different machines. It is *possible* to support transparently distributed shared memory, indeed it is in principle possible and fairly easy with the existing Accent/Mach architecture, but I think that hiding communication boundaries while attractive from a conceptual point of view would also hide the underlying reality in terms of cost and reliability. Also, most current machines do not have long enough virtual addresses that you can expect to map many user address spaces together. The Psyche people have of course a completely different attitude, and I would dare say that to them points 2) and especially 3) are the most important because their target is a NUMA machine, that is a not-too-loosely coupled multiprocessor and not a (possibly over a wide area) distributed system (Mach seems ever more oriented to very closely coupled multiprocessors), and where hardware effort has been expended to provide some credible, efficient illusion of global shared memory, that should be exploited. In other words, my reckoning is that while in current Accent/Mach kernels there is a "short-circuit" path as an exception to the normal mechanism, the (sw) architecture should be such that such a thing is actually the standard. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk