Path: utzoo!attcan!uunet!samsung!sdd.hp.com!apollo!apollo.hp.com!mishkin
From: mishkin@apollo.HP.COM (Nathaniel Mishkin)
Newsgroups: comp.protocols.misc
Subject: Re: RPC Technologies
Keywords: Transports UDP TCP Performance
Message-ID: <1990Sep14.093420@apollo.HP.COM>
Date: 14 Sep 90 13:34:00 GMT
References: <1990Sep5.194621.11656@athena.mit.edu> <1990Sep7.153710@apollo.HP.COM> <142133@sun.Eng.Sun.COM> <1990Sep11.131429@apollo.HP.COM> <142319@sun.Eng.Sun.COM>
Sender: root@apollo.HP.COM
Reply-To: mishkin@apollo.HP.COM (Nathaniel Mishkin)
Organization: Hewlett-Packard Company - Cooperative Object Computing Operation
Lines: 118

In article <142319@sun.Eng.Sun.COM>, vipin@samsun.Sun.COM (Vipin Samar) writes:
>Yes, this is precisely what I am interested in knowing.  Does transparently
>mean "random" here because any of the available protocols
>give the same functionality or is there some implicit ordering? 

It means "random because any of the available protocols give the same
functionality".  Like I said, define an scheme for making a more
intelligent choice among equivalent servers is an area for future work.

>Yes, I agree this is a tough problem on formalizing these
>heuristics - perhaps after all it is not such a bad idea to give control to
>users in such cases.  One can never really have all the bases covered.

I definitely agree that there should be at least one mode that lets
user policy choices come into play.

>O.K., so you did build the entire TCP layer on top of UDP and
>you added all the code for that in the NCS library.  Has that step really
>solved all the problems or perhaps has just moved the problems from one
>place to another.

Clearly there are SOME problems that just don't go away.  We believe that
many do.

>If one cannot support huge (say 100) TCP connections because of resource
>problem, then you have said that NCS/UDP will be able to
>overcome the resource problem.  I wonder if that is true, because
>building up of entire TCP functionality on top of UDP is going to require
>system resources as well.  Multiplexing the file descriptors may
>solve this problem to some extent, but not completely, because one would
>still need to keep all those buffers hanging around for any decent
>congestion control and rate-flow problems.

Of course.  However it is also of course true that NCS RPC's buffers
(unlike those of the kernel implementation of TCP) aren't consuming wired
physical memory, thus giving NCS some more room to play.  And then there's
the wired space in the kernel for TCP connection blocks and any other
miscellaneous TCP overhead.  Further, the nature of the NCS RPC protocol
lets the server simply blow off its pseudo-connections to clients at
any point that a call is not in progress; the pseudo-connection
reestablishment happens automatically and transparently to the application
(and the stubs).

It is also true that even an RPC running on top of a COTP has buffer
management problems and that they're arguably HARDER than RPC's over
CLTP's.  Note that when I say RPC/COTP I mean one that has somewhat more
complicated problems than say, SunRPC/TCP.  The problematic scenario
is the case where for some reason, the stub is not consuming its input
data stream for some period of time.  In our model, we have a thread
(or set of threads), distinct from the thread running the stub, that's
responsible for actually reading from the connection.  One reason for
this is so that client-generated "cancels" (i.e., requests to abort calls)
can be processed even if the stub/server thread isn't currently in the
RPC runtime.  Thus, there is a layer of buffering between the connection
and the stub.  There is finite space for this buffering.  Since once
you've read data from a connection, you can't discard the data (unless
you want to oblige the client stubs to be smart enough to restart the
call), you have the problem that you might have to be smart enough to
figure that you have to stop reading from some of your open connections.
But then you don't see the "cancel" message.  Without boring people any
more, you can more or less work around these problems, but it's by no
means simple.

>Also, retransmissions would be of the entire UDP packet size and not just
>of the size of the MTU of the network.  Or have you worked around this
>problem by reducing the UDP packet size to the MTU size?  I hope not.
>This retransmission problem will happen only during stormy weather, but
>whenever that happens, it sure will let hell break loose.
>TCP backs off very graciously in such circumstances. How does your protocol
>behave in these cases?

We endeavor to send MTU-sized UDP packets and I don't know why you "hope
not".  The general wisdom as I see it seems to be that depending on
IP-level fragmentation is a bad idea.  Of course, it's currently hard
to know the (what's called) "path MTU".  (I refer people to the IETF
"Path MTU Discovery" document for more details.)  So we guess and try
to be conservative and we eagerly await the widespread implementation
of the protocol modifications required to do path MTU discovery.  I think
the IETF has now endorsed them.

>Are all the TCP tricks on top of UDP in the kernel or in the user land?
>If it is in user land, I am not sure how gratifying performance you are going
>to get out of it.

We have no special kernel support.  (Note that there's a version of NCS
that runs in the kernel, but it's only for the benefit of NCS applications
that live in the kernel.)  I can't give you numbers right now.  (I don't
have them and I don't know that I could make them public right now anyway.)
(BTW, I would not be surprised to hear that NCS 1.5's bulk data throughput
is lower than TCP's on some systems.)

However, we believe that doing (what amounts to) a COTP in user space
on top of UDP will be sufficiently rewarding.  Even if it is only as
fast or slight slower than TCP it is rewarding in that it has better
scaling properties, which we care a lot about.  Further, we're basing
much our opinions on experiments that we've done and on the real-life
experience the CMU ITC people with their Rx RPC system, which essentially
does the same thing that NCS RPC does.  In any case I take seriously
my responsibility to produce a real evaluation of the wisdom of this
approach once I'm in a position to do the evaluation.

>Another problem with NCS approach is that the user cannot take advantages
>out of any of the new performance improvements of any of the transports.
>They will have to wait for the NCS team to hack those same fixes into their
>software.  In ONC, they need not do anything.

Fair enough.  On the other hand, to get the benefits of performance
improvements in NCS, all you need to do is get a new NCS library and
not get a whole new kernel from your vendor (as would likely be required
to get any improved TCP).  My experience (as both vendor and consumer)
has been that getting updated user-mode software to customers is typically
much easier than getting new kernels to them.

--
                    -- Nat Mishkin
                       Cooperative Object Computing Operation
                       Hewlett-Packard Company
                       mishkin@apollo.hp.com