Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!usc!cs.utexas.edu!rice!uw-beaver!cornell!ken From: ken@gvax.cs.cornell.edu (Ken Birman) Newsgroups: comp.sys.isis Subject: ISIS RPC timing figures Message-ID: <34468@cornell.UUCP> Date: 20 Nov 89 18:41:45 GMT Sender: nobody@cornell.UUCP Reply-To: ken@cs.cornell.edu (Ken Birman) Distribution: comp Organization: Cornell Univ. CS Dept, Ithaca NY Lines: 79 Some time ago, I promised to post RPC timing figures for ISIS when run using the new BYPASS facilities. The table that follows was generated using the newest and fastest version of this code at Cornell and should be roughly the same in ISIS V2.0. The test involved a process that times RPC's to: itself; a loop-back with no communication a local destination on the same machine using UDP for IPC a remote destination (different machine, still using UDP) The machines are SUN 3/60's running SUN OS 4.0.2 and the C compiler was run with optimization level 3 (-O3). Each test measures 100 "runs", each "run" measures the time to do 10 RPC calls. The message sent and the reply have the same size, which is is varied in these tests from 2 bytes to 131k bytes. Messages have an additional fixed overhead of about 256 bytes in this test. A typical line gives the fast observed time, the average, and the slowest observed time in milliseconds. The costs associated with UDP send and recvfrom and select are shown on the extreme right. We use "sendmsg" rather than "sendto" and this seems to be unexpected source of overhead. The bottom line is is that ISIS itself spends about 2.8ms computing, total, in these runs, and the remainder of the time is spent doing I/O. The 2.0 figure for the loop-back case is .3ms faster because we don't need to read a message from the wire and reconstruct it and .5ms because we skip the ISIS windowed transmission protocol in this case. By far the dominent cost factor turns out to be the kernel cost of select, recvfrom, sendto and sendmsg in these examples. Dynamic memory allocation, packet (re)construction and lightweight tasking overheads show up after this. For example, if you break down the 10ms for the local RPC 2-byte case, you get about 7.2ms in the IO calls and select, .5 in the windowing code, .3ms to rebuild the packet, and 2.0 doing task create and building the message and reply. Note that one can get a UDP ping-pong for 2-byte packets to run much faster -- I get a figure of between 4.6 and 5.0ms, but the cost of sending larger packets (due to ISIS overhead), calling select for two open file descriptors, and using sendmsg with 3 or 4 "iovector" entries rather than just 1 causes a signifcant overhead amounting to at least 2.2ms; the cost varies with the packet size. Bandwidth runs at betwen 280kbytes and 360kbytes/second for the larger packet sizes. For a 10Mbit ether, this is pretty respectable (TCP is faster but has a kernel implementation). [ MIN AVG MAX] self 2: [ 2.0, 2.0, 2.0] self 2048: [ 2.0, 2.0, 2.0] self 4096: [ 2.0, 2.0, 2.0] self 8192: [ 2.0, 2.0, 2.0] self 16384: [ 2.0, 2.0, 2.0] self 32768: [ 2.0, 2.0, 2.0] self 65536: [ 2.0, 2.0, 2.0] self 131072: [ 2.0, 2.0, 2.0] self 262144: [ 2.0, 2.0, 2.0] local 2: [10.0, 10.0, 10.0] local 2048: [20.0, 20.0, 20.0] local 4096: [22.0, 23.0, 24.0] local 8192: [44.0, 44.3, 46.0] local 16384: [82.0, 82.0, 84.0] local 32768: [146.0, 147.4, 148.0] local 65536: [284.0, 284.5, 286.0] local 131072: [562.0, 563.1, 566.0] remote 2: [12.0, 12.0, 12.0] remote 2048: [22.0, 23.8, 24.0] remote 4096: [32.0, 32.0, 32.0] remote 8192: [58.0, 58.0, 60.0] remote 16384: [108.0, 108.9, 110.0] remote 32768: [196.0, 197.9, 200.0] remote 65536: [374.0, 381.3, 392.0] remote 131072: [772.0, 819.7, 888.0] ... we've run up to a megabyte of data at a time, but the bandwidth actually starts to drop off above 131k because of flow control effects. I see little point in trying to trim more time off the ISIS part of this under SUN UNIX, but we might look at these figures again under MACH, where the local IPC is much faster. Multicast figures coming soon... Ken