Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!usc!cs.utexas.edu!rice!uw-beaver!cornell!ken
From: ken@gvax.cs.cornell.edu (Ken Birman)
Newsgroups: comp.sys.isis
Subject: ISIS RPC timing figures
Message-ID: <34468@cornell.UUCP>
Date: 20 Nov 89 18:41:45 GMT
Sender: nobody@cornell.UUCP
Reply-To: ken@cs.cornell.edu (Ken Birman)
Distribution: comp
Organization: Cornell Univ. CS Dept, Ithaca NY
Lines: 79


Some time ago, I promised to post RPC timing figures for ISIS
when run using the new BYPASS facilities.  The table that follows
was generated using the newest and fastest version of this code
at Cornell and should be roughly the same in ISIS V2.0.

The test involved a process that times RPC's to:
	itself; a loop-back with no communication
	a local destination on the same machine using UDP for IPC
	a remote destination (different machine, still using UDP)
The machines are SUN 3/60's running SUN OS 4.0.2 and the C compiler
was run with optimization level 3 (-O3).  Each test measures 100
"runs", each "run" measures the time to do 10 RPC calls.  The
message sent and the reply have the same size, which is is varied
in these tests from 2 bytes to 131k bytes.  Messages have an additional
fixed overhead of about 256 bytes in this test.

A typical line gives the fast observed time, the average, and the slowest
observed time in milliseconds.   The costs associated with UDP send
and recvfrom and select are shown on the extreme right.  We use "sendmsg"
rather than "sendto" and this seems to be unexpected source of overhead.

The bottom line is is that ISIS itself spends about 2.8ms computing, total,
in these runs, and the remainder of the time is spent doing I/O.  The
2.0 figure for the loop-back case is .3ms faster because we don't need to
read a message from the wire and reconstruct it and .5ms because we skip
the ISIS windowed transmission protocol in this case.  By far the
dominent cost factor turns out to be the kernel cost of select, recvfrom,
sendto and sendmsg in these examples.  Dynamic memory allocation, packet
(re)construction and lightweight tasking overheads show up after this.

For example, if you break down the 10ms for the local RPC 2-byte case,
you get about 7.2ms in the IO calls and select, .5 in the windowing code,
.3ms to rebuild the packet, and 2.0 doing task create and building the message
and reply.  Note that one can get a UDP ping-pong for 2-byte packets to run
much faster -- I get a figure of between 4.6 and 5.0ms, but the cost of
sending larger packets (due to ISIS overhead), calling select for two
open file descriptors, and using sendmsg with 3 or 4 "iovector" entries
rather than just 1 causes a signifcant overhead amounting to at least
2.2ms; the cost varies with the packet size.

Bandwidth runs at betwen 280kbytes and 360kbytes/second for the larger
packet sizes.  For a 10Mbit ether, this is pretty respectable (TCP
is faster but has a kernel implementation).

                [ MIN    AVG     MAX]
self      2:    [ 2.0,   2.0,    2.0]
self   2048:    [ 2.0,   2.0,    2.0]
self   4096:    [ 2.0,   2.0,    2.0]
self   8192:    [ 2.0,   2.0,    2.0]
self  16384:    [ 2.0,   2.0,    2.0]
self  32768:    [ 2.0,   2.0,    2.0]
self  65536:    [ 2.0,   2.0,    2.0]
self 131072:    [ 2.0,   2.0,    2.0]
self 262144:    [ 2.0,   2.0,    2.0]
local      2:   [10.0,  10.0,   10.0]
local   2048:   [20.0,  20.0,   20.0]
local   4096:   [22.0,  23.0,   24.0]
local   8192:   [44.0,  44.3,   46.0]
local  16384:   [82.0,  82.0,   84.0]
local  32768:   [146.0, 147.4,  148.0]
local  65536:   [284.0, 284.5,  286.0]
local 131072:   [562.0, 563.1,  566.0]
remote      2:  [12.0,  12.0,   12.0]
remote   2048:  [22.0,  23.8,   24.0]
remote   4096:  [32.0,  32.0,   32.0]
remote   8192:  [58.0,  58.0,   60.0]
remote  16384:  [108.0, 108.9,  110.0]
remote  32768:  [196.0, 197.9,  200.0]
remote  65536:  [374.0, 381.3,  392.0]
remote 131072:  [772.0, 819.7,  888.0]

... we've run up to a megabyte of data at a time, but the bandwidth
actually starts to drop off above 131k because of flow control effects.
I see little point in trying to trim more time off the ISIS part of this
under SUN UNIX, but we might look at these figures again under MACH,
where the local IPC is much faster.  Multicast figures coming soon...

Ken