Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ukma!tut.cis.ohio-state.edu!pt.cs.cmu.edu!andrew.cmu.edu!<UNAUTHENTICATED>+
From: Richard.Draves@CS.CMU.EDU
Newsgroups: comp.os.mach
Subject: Re: Mach context switch time
Message-ID: <IZBznk_00hYPEcB0pl@cs.cmu.edu>
Date: 15 Oct 89 03:43:44 GMT
References: <2895@netcom.UUCP>
Sender: rpd@M.GP.CS.CMU.EDU
Organization: Carnegie Mellon, Pittsburgh, PA
Lines: 158
In-Reply-To: <2895@netcom.UUCP>

In a following message, I'll post some performance numbers in response
to Ken Birman's request for a comparison of Mach IPC and Unix IPC.  But
first I would like to comment on this program.

> Excerpts from netnews.comp.os.mach: 12-Oct-89 Mach context switch time
> Jonathan Hue@netcom.UUCP (2816)

> #include <sys/types.h>
> #include <stdio.h>
> #include <mach.h>
> #include <sys/message.h>

> struct mymsg
> {
>     msg_header_t	my_header;
>     msg_type_t		my_type;
> };

> main(argc, argv)
> int argc;
> char **argv;
> {
>     register int iterations=1;
>     register kern_return_t error;
>     port_t port, newport;
>     unsigned int num_ports;
>     struct mymsg msg;
>     port_t port_set[10];
>     port_array_t new_set;

>     if (argc == 2)
> 	iterations = atoi(*++argv);
>     if ((error = port_allocate(task_self(), &port)) != KERN_SUCCESS)
> 	mach_error("port_allocate", error);
>     if ((error = port_set_backlog(task_self(), port, 10)) !=
> KERN_SUCCESS)
> 	mach_error("port_set_backlog", error);
>     port_set[0] = port;
>     mach_ports_register(task_self(), port_set, 1);
>     switch (fork())
>     {
> 	case -1:
> 	    perror("fork");
> 	    exit(1);
> 	case 0:
> 	    mach_ports_lookup(task_self(), &new_set, &num_ports);
> 	    port = *new_set;
> 	    msg.my_header.msg_remote_port = port;
> 	    if ((error = port_allocate(task_self(), &newport)) != KERN_SUCCESS)
> 		mach_error("port_allocate", error);
> 	    msg.my_header.msg_remote_port = port;
> 	    msg.my_header.msg_local_port = newport;
> 	    msg.my_header.msg_id = 0xc0ffee;
> 	    msg.my_header.msg_size = sizeof(msg);
> 	    msg.my_header.msg_type = MSG_TYPE_NORMAL;
> 	    msg.my_header.msg_simple = TRUE;
> 	    
> 	    msg.my_type.msg_type_name = MSG_TYPE_INTEGER_32;
> 	    msg.my_type.msg_type_size = 32;
> 	    msg.my_type.msg_type_number = 0;
> 	    msg.my_type.msg_type_inline = TRUE;
> 	    msg.my_type.msg_type_longform = FALSE;
> 	    msg.my_type.msg_type_deallocate = FALSE;
> 	    while (--iterations != -1)
> 	    {
> 		if ((error = msg_rpc(&(msg.my_header), SEND_SWITCH,
> 				     sizeof(msg), 0, 0)) != RPC_SUCCESS)
> 		    mach_error("msg_send", error);
> 	    }
> 	    exit(0);
> 	default:
> 	    msg.my_header.msg_local_port = port;
> 	    msg.my_header.msg_size = sizeof(msg);
> 	    while (--iterations != -1)
> 	    {
> 		if ((error = msg_receive(&(msg.my_header), MSG_OPTION_NONE, 0))
> 		    != RCV_SUCCESS)
> 		    mach_error("msg_receive", error);
> 		if ((error = msg_send(&(msg.my_header), SEND_SWITCH, 0)) !=
> 		     SEND_SUCCESS)
> 		    mach_error("msg_receive", error);
> 	    }
> 	    break;
>     }
>     wait(0);
> }


It is possible to use messages which have no body, just a header.  So
"struct mymsg" need not include "my_type".

There is no need to use port_set_backlog here, although it certainly
doesn't hurt.

My personal preference is to avoid mach_ports_register and
mach_ports_lookup when possible.  They are a hack that lets tasks
acquire some initial send rights, like for the name service.  For your
purposes, why not use netname_check_in and netname_look_up?

(OK, I can think of a reason you might have used mach_ports_register
and mach_ports_lookup.  They will work single-user, when the name
service isn't up.  I have a simple name server which I use when the
netmsgserver isn't running or available.)

The benchmark has the server include rights for the reply port (newport)
in the reply message.  Mig makes the msg_local_port field in reply
messages be PORT_NULL; this is a little faster because the kernel only
has to handle one port in the reply message instead of two.

On most architectures, there is no problem with having the benchmark
program fork to get client and server tasks.  However, this doesn't work
very well on some machines, like RTs.  The problem is hardware
architectures which don't allow convenient sharing of physical pages. 
The RT only allows sharing of segments.  The RT pmap module
(machine-dependent VM module) isn't smart enough to figure out that the
text pages of the child and parent can be shared by using a single
segment; it uses two segments and shuffles the pages back and forth. 
What this means is that on the RT, the benchmark will be taking some
faults on every context switch.  These are relatively inexpensive
faults; they just need to fiddle with the RT's hardware data structures.
 But you probably don't want to be measuring them.  Other architectures
(I don't know of any off-hand) might suffer from similar problems.

The SEND_SWITCH option for msg_rpc and msg_send is something that NeXT
decided to export to the user; in Mach 2.5, it is only available
internally.  It is a scheduling hint.  If SEND_SWITCH is used when a
message is sent, and a receiver is waiting, then the kernel will
context-switch immediately to the receiver.  Normally the sender keeps
running, and the receiver won't run until normal scheduling picks it.

msg_rpc turns on SEND_SWITCH internally, so there is no reason for a
NeXT user to use SEND_SWITCH with msg_rpc.  I doubt the SEND_SWITCH on
the msg_send is doing much for the benchmark either.  Normally when
doing repeated RPCs, client using msg_rpc and server using
msg_receive/msg_send, things work as follows.  The server is blocked in
msg_receive.  The client executes msg_rpc.  Because of SEND_SWITCH is
used internally, we switch to the server.  The server uses msg_send for
the reply.  SEND_SWITCH wouldn't do anything, because the client hasn't
gotten to the receive part of its msg_rpc yet.  server executes
msg_receive and blocks.  scheduler picks client, which resumes the
msg_rpc and goes to receive the reply.  It picks up the reply message
sitting there and loops around for another msg_rpc, etc.

With SEND_SWITCH on the msg_send, another mode of operation is possible.
 The client is blocked in the receive part of the msg_rpc, and the
server executes msg_send with SEND_SWITCH, so we switch to the client. 
The client loops around and does another msg_rpc.  In the send part, the
server isn't blocked in its receive yet, so the internal SEND_SWITCH
does nothing.  The client keeps going and blocks in the receive part
again.  The scheduler picks the server, which comes out of the msg_send,
executes the msg_receive, and executes the msg_send again, etc.

I expect the time for an RPC is about the same in these two modes,
although I haven't checked that.  In any case, the benchmark is probably
vacillating between them as scheduling quanta and other things can flip
the system from one mode to another.

Rich