Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ukma!tut.cis.ohio-state.edu!pt.cs.cmu.edu!andrew.cmu.edu!+ From: Richard.Draves@CS.CMU.EDU Newsgroups: comp.os.mach Subject: Re: Mach context switch time Message-ID: Date: 15 Oct 89 03:43:44 GMT References: <2895@netcom.UUCP> Sender: rpd@M.GP.CS.CMU.EDU Organization: Carnegie Mellon, Pittsburgh, PA Lines: 158 In-Reply-To: <2895@netcom.UUCP> In a following message, I'll post some performance numbers in response to Ken Birman's request for a comparison of Mach IPC and Unix IPC. But first I would like to comment on this program. > Excerpts from netnews.comp.os.mach: 12-Oct-89 Mach context switch time > Jonathan Hue@netcom.UUCP (2816) > #include > #include > #include > #include > struct mymsg > { > msg_header_t my_header; > msg_type_t my_type; > }; > main(argc, argv) > int argc; > char **argv; > { > register int iterations=1; > register kern_return_t error; > port_t port, newport; > unsigned int num_ports; > struct mymsg msg; > port_t port_set[10]; > port_array_t new_set; > if (argc == 2) > iterations = atoi(*++argv); > if ((error = port_allocate(task_self(), &port)) != KERN_SUCCESS) > mach_error("port_allocate", error); > if ((error = port_set_backlog(task_self(), port, 10)) != > KERN_SUCCESS) > mach_error("port_set_backlog", error); > port_set[0] = port; > mach_ports_register(task_self(), port_set, 1); > switch (fork()) > { > case -1: > perror("fork"); > exit(1); > case 0: > mach_ports_lookup(task_self(), &new_set, &num_ports); > port = *new_set; > msg.my_header.msg_remote_port = port; > if ((error = port_allocate(task_self(), &newport)) != KERN_SUCCESS) > mach_error("port_allocate", error); > msg.my_header.msg_remote_port = port; > msg.my_header.msg_local_port = newport; > msg.my_header.msg_id = 0xc0ffee; > msg.my_header.msg_size = sizeof(msg); > msg.my_header.msg_type = MSG_TYPE_NORMAL; > msg.my_header.msg_simple = TRUE; > > msg.my_type.msg_type_name = MSG_TYPE_INTEGER_32; > msg.my_type.msg_type_size = 32; > msg.my_type.msg_type_number = 0; > msg.my_type.msg_type_inline = TRUE; > msg.my_type.msg_type_longform = FALSE; > msg.my_type.msg_type_deallocate = FALSE; > while (--iterations != -1) > { > if ((error = msg_rpc(&(msg.my_header), SEND_SWITCH, > sizeof(msg), 0, 0)) != RPC_SUCCESS) > mach_error("msg_send", error); > } > exit(0); > default: > msg.my_header.msg_local_port = port; > msg.my_header.msg_size = sizeof(msg); > while (--iterations != -1) > { > if ((error = msg_receive(&(msg.my_header), MSG_OPTION_NONE, 0)) > != RCV_SUCCESS) > mach_error("msg_receive", error); > if ((error = msg_send(&(msg.my_header), SEND_SWITCH, 0)) != > SEND_SUCCESS) > mach_error("msg_receive", error); > } > break; > } > wait(0); > } It is possible to use messages which have no body, just a header. So "struct mymsg" need not include "my_type". There is no need to use port_set_backlog here, although it certainly doesn't hurt. My personal preference is to avoid mach_ports_register and mach_ports_lookup when possible. They are a hack that lets tasks acquire some initial send rights, like for the name service. For your purposes, why not use netname_check_in and netname_look_up? (OK, I can think of a reason you might have used mach_ports_register and mach_ports_lookup. They will work single-user, when the name service isn't up. I have a simple name server which I use when the netmsgserver isn't running or available.) The benchmark has the server include rights for the reply port (newport) in the reply message. Mig makes the msg_local_port field in reply messages be PORT_NULL; this is a little faster because the kernel only has to handle one port in the reply message instead of two. On most architectures, there is no problem with having the benchmark program fork to get client and server tasks. However, this doesn't work very well on some machines, like RTs. The problem is hardware architectures which don't allow convenient sharing of physical pages. The RT only allows sharing of segments. The RT pmap module (machine-dependent VM module) isn't smart enough to figure out that the text pages of the child and parent can be shared by using a single segment; it uses two segments and shuffles the pages back and forth. What this means is that on the RT, the benchmark will be taking some faults on every context switch. These are relatively inexpensive faults; they just need to fiddle with the RT's hardware data structures. But you probably don't want to be measuring them. Other architectures (I don't know of any off-hand) might suffer from similar problems. The SEND_SWITCH option for msg_rpc and msg_send is something that NeXT decided to export to the user; in Mach 2.5, it is only available internally. It is a scheduling hint. If SEND_SWITCH is used when a message is sent, and a receiver is waiting, then the kernel will context-switch immediately to the receiver. Normally the sender keeps running, and the receiver won't run until normal scheduling picks it. msg_rpc turns on SEND_SWITCH internally, so there is no reason for a NeXT user to use SEND_SWITCH with msg_rpc. I doubt the SEND_SWITCH on the msg_send is doing much for the benchmark either. Normally when doing repeated RPCs, client using msg_rpc and server using msg_receive/msg_send, things work as follows. The server is blocked in msg_receive. The client executes msg_rpc. Because of SEND_SWITCH is used internally, we switch to the server. The server uses msg_send for the reply. SEND_SWITCH wouldn't do anything, because the client hasn't gotten to the receive part of its msg_rpc yet. server executes msg_receive and blocks. scheduler picks client, which resumes the msg_rpc and goes to receive the reply. It picks up the reply message sitting there and loops around for another msg_rpc, etc. With SEND_SWITCH on the msg_send, another mode of operation is possible. The client is blocked in the receive part of the msg_rpc, and the server executes msg_send with SEND_SWITCH, so we switch to the client. The client loops around and does another msg_rpc. In the send part, the server isn't blocked in its receive yet, so the internal SEND_SWITCH does nothing. The client keeps going and blocks in the receive part again. The scheduler picks the server, which comes out of the msg_send, executes the msg_receive, and executes the msg_send again, etc. I expect the time for an RPC is about the same in these two modes, although I haven't checked that. In any case, the benchmark is probably vacillating between them as scheduling quanta and other things can flip the system from one mode to another. Rich