Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!mit-eddie!apollo!apollo.hp.com!mishkin
From: mishkin@apollo.HP.COM (Nathaniel Mishkin)
Newsgroups: comp.protocols.misc
Subject: Re: RPC Technologies
Keywords: Transports UDP TCP Performance
Message-ID: <1990Sep18.163702@apollo.HP.COM>
Date: 18 Sep 90 20:37:00 GMT
References: <1990Sep5.194621.11656@athena.mit.edu> <1990Sep7.153710@apollo.HP.COM> <1990Sep14.093420@apollo.HP.COM> <142630@sun.Eng.Sun.COM>
Sender: root@apollo.HP.COM
Reply-To: mishkin@apollo.HP.COM (Nathaniel Mishkin)
Organization: Hewlett-Packard Company - Cooperative Object Computing Operation
Lines: 152

In article <142630@sun.Eng.Sun.COM>, vipin@samsun.Sun.COM (Vipin Samar) writes:
>So, we have agreed on one thing that it is a good idea to have an
>escape hatch for programmers/users to be able to choose a
>particular transport.  Any ideas as to how will you extend NCS to do it?

I don't have any concrete proposal.  However, let me point out some things.

There are two kinds of choices one might want to make:  (1) choose which
protocol stack (by which I mean a generalization of your scheme that
lets you choose which transport you want) you want to use, and (2) choose
which SERVER you want, among a set of equivalent servers.  The first
kind of choice is addressed in part by your environment variable.  (I
think it's more likely what you'd want to specify is a ranking of stacks;
i.e., "I want to use RPC/UDP/IP if the server has it, otherwise I'll
accept RPC/ROSE/.../TP4, otherwise ...".)  The second choice is a little
trickier.  You might want to choose based on the number of hops to the
server or the class of machine it is or how busy it is.

Basically, NCS 2.0 has three functions:  "import", "lookup", and "select".
(These aren't the exact names of course.) "import" basically just calls
"lookup" followed by "select".  "lookup" returns a set of "compatible
servers" -- i.e., ones that you have a prayer of being able to talk with
(i.e., they've registered support for the interface you're interested
in and they share all the necessary communications protocols with you).
"select" picks one server out of a set returned by "lookup", currently
at random.  The idea is that most clients will just use "import" and
trust that "select" does an OK thing and that in the future maybe it
will do even more OK things.  However, sophisticated clients are free
to call "lookup" and select a server based on whatever criteria the
application writer feels like applying, including looking at environment
variables and reading configuration files.

>I am assuming that you blow away the connection state only when you have the
>guarantee that the data has reached the other side.  

Yes.

>TCP also frees
>up all the associated buffer space in such cases.  The only difference
>would be in the number of mbuf's (2?)reqd to keep the state/sequencing.
>Streaming/UDP becomes a win, only if one knows beforehand that there will be
>hundreds of connections for that particular RPC service.
>	Even when one is using TCP, the server can blow away the
>connection (if the call is not in progress) and the client will simply
>have to rebind (at some extra cost).  The client RPC library should
>provide the required transparency.  

This is in essence what NCS 2.0 does when running over a COTP.  Note
that there are some slightly tricky details having to do with what
it looks like to the client in the case where it's launching a "request"
over the connection at exactly the same time the server is closing
the connection.  The client might have a bit of a hard time distinguishing
the network close from a server crash and can't know whether the call
might have been executed and as a result not know whether it's safe
-- i.e., meets the "at most once" rule -- to try to execute the call
again at the same or another server.

>The question is which is better -
>pay huge cost at each call (discussed in earlier messages) OR pay
>rebinding cost only when it is required?  In my opinion, second approach
>is better.

What's the huge cost?  Anyway, I wouldn't write off the rebinding cost.
The TCP disconnect and connect are going to cost around 6 network messages.
A busy server is going to have to close connections fairly regularly.
(Of course NCS has some analogous overhead, but (a) it's lower because
there are no close messages, and (b) the authentication setup is
piggy-backed on the connection setup messages.  Any authentication
messages [challenge/response] would be yet more overhead messages in a
TCP-based scheme.)

Note also that I don't know how small you can reduce the "state/sequencing"
info in TCP and I don't know how big your mbufs are, but in NCS/RPC we
can keep literally just (the equivalent of) the state value, connection
ID, and sequence number and we're using the generalized user space heap
storage package that has few restrictions (e.g., in the granularity of
the allocation size).  We can keep a LOT of connections' state.

>If I assume that both the threads are share the buffer space, then what do
>you mean by "finite space for this buffering"?  Also, the problem
>described above seems more like a problem created by your implementation.

It may seem that way, but I think it's not.  It all falls out from the
requirement of being able to cancel an in-progress call.  I believe this
makes it problematic to let the thread that's doing the unmarshalling
actually be the thread that's reading from the connection.  Thus, a layer
of buffering is required.  There is some fixed number of buffers you
will be willing to assign to this task.  Once this space is exhausted
you have a problem.

>Also, this problem should exist for CLTP too.  How did you handle that case?

RPC/CLTP can always discard data that it's read from the network because
the sender never discards the data until the data has been ack'd by the
receiver.  When the receiver discards data, it doesn't ack it, inducing
the sender to retransmit the data.  (The details include the fact that
the receiver will set his "offered window" value in the ack's it sends
to zero.)

>>We endeavor to send MTU-sized UDP packets and I don't know why you "hope
>>not".
>
>Because I want good performance and that can happen only by minimizing
>transitions between kernel and user land.  Choosing a large number is
>bad for stormy cases & retransmissions and leads to overrun for machines
>with fewer mbuf's or less memory.  Choosing a small number for UDP packet
>size is bad for performance.  Sun decided to go with a compromise - 8K
>UDP packet size.  

I understand the penalties of kernel/user transitions, but I also
understand the sending of 8K UDP datagrams to be a dangerous proposition
since if any one IP fragment is lost, you are obliged to retransmit
ALL the fragments since there's no mechanism for indicating which
fragment was lost.

>In NCS, is there a mechanism for size negotiation
>for faster/bigger machines (lots of memory) which can easily handle
>bigger size UDP packets?  

This is not a machine size/speed issue.  It has to do with the network.
If there are 5 gateways between the sender and the receiver and they're
busy, the odds that one of the fragments will be lost could well be
unacceptably high (esp. considering the cost of a loss will be the load
of sending ALL the fragments again).

I'll leave it for the Internet experts to make some "official" statement
(if such a thing is possible), but I've gotten the strong feeling that
depending on IP fragmentation is a bad idea.

>Just curious - what do you guys do for NFS?  Do you use 1K packets only?

I'm the wrong person to ask about NFS.  I don't have anything to do with
it.

>Actually, you can easily rest all of my fears by posting some performance
>ratios for UDP/TCP (not numbers - I know they are confidential)

I don't have numbers right now.

>> In any case I take seriously
>>my responsibility to produce a real evaluation of the wisdom of this
>>approach once I'm in a position to do the evaluation.
>
>I hope that performance will be one of the criterian.

Yes, but it's important to remember that it's only one of the criteria

--
                    -- Nat Mishkin
                       Cooperative Object Computing Operation
                       Hewlett-Packard Company
                       mishkin@apollo.hp.com