Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!mit-eddie!apollo!apollo.hp.com!mishkin From: mishkin@apollo.HP.COM (Nathaniel Mishkin) Newsgroups: comp.protocols.misc Subject: Re: RPC Technologies Keywords: Transports UDP TCP Performance Message-ID: <1990Sep18.163702@apollo.HP.COM> Date: 18 Sep 90 20:37:00 GMT References: <1990Sep5.194621.11656@athena.mit.edu> <1990Sep7.153710@apollo.HP.COM> <1990Sep14.093420@apollo.HP.COM> <142630@sun.Eng.Sun.COM> Sender: root@apollo.HP.COM Reply-To: mishkin@apollo.HP.COM (Nathaniel Mishkin) Organization: Hewlett-Packard Company - Cooperative Object Computing Operation Lines: 152 In article <142630@sun.Eng.Sun.COM>, vipin@samsun.Sun.COM (Vipin Samar) writes: >So, we have agreed on one thing that it is a good idea to have an >escape hatch for programmers/users to be able to choose a >particular transport. Any ideas as to how will you extend NCS to do it? I don't have any concrete proposal. However, let me point out some things. There are two kinds of choices one might want to make: (1) choose which protocol stack (by which I mean a generalization of your scheme that lets you choose which transport you want) you want to use, and (2) choose which SERVER you want, among a set of equivalent servers. The first kind of choice is addressed in part by your environment variable. (I think it's more likely what you'd want to specify is a ranking of stacks; i.e., "I want to use RPC/UDP/IP if the server has it, otherwise I'll accept RPC/ROSE/.../TP4, otherwise ...".) The second choice is a little trickier. You might want to choose based on the number of hops to the server or the class of machine it is or how busy it is. Basically, NCS 2.0 has three functions: "import", "lookup", and "select". (These aren't the exact names of course.) "import" basically just calls "lookup" followed by "select". "lookup" returns a set of "compatible servers" -- i.e., ones that you have a prayer of being able to talk with (i.e., they've registered support for the interface you're interested in and they share all the necessary communications protocols with you). "select" picks one server out of a set returned by "lookup", currently at random. The idea is that most clients will just use "import" and trust that "select" does an OK thing and that in the future maybe it will do even more OK things. However, sophisticated clients are free to call "lookup" and select a server based on whatever criteria the application writer feels like applying, including looking at environment variables and reading configuration files. >I am assuming that you blow away the connection state only when you have the >guarantee that the data has reached the other side. Yes. >TCP also frees >up all the associated buffer space in such cases. The only difference >would be in the number of mbuf's (2?)reqd to keep the state/sequencing. >Streaming/UDP becomes a win, only if one knows beforehand that there will be >hundreds of connections for that particular RPC service. > Even when one is using TCP, the server can blow away the >connection (if the call is not in progress) and the client will simply >have to rebind (at some extra cost). The client RPC library should >provide the required transparency. This is in essence what NCS 2.0 does when running over a COTP. Note that there are some slightly tricky details having to do with what it looks like to the client in the case where it's launching a "request" over the connection at exactly the same time the server is closing the connection. The client might have a bit of a hard time distinguishing the network close from a server crash and can't know whether the call might have been executed and as a result not know whether it's safe -- i.e., meets the "at most once" rule -- to try to execute the call again at the same or another server. >The question is which is better - >pay huge cost at each call (discussed in earlier messages) OR pay >rebinding cost only when it is required? In my opinion, second approach >is better. What's the huge cost? Anyway, I wouldn't write off the rebinding cost. The TCP disconnect and connect are going to cost around 6 network messages. A busy server is going to have to close connections fairly regularly. (Of course NCS has some analogous overhead, but (a) it's lower because there are no close messages, and (b) the authentication setup is piggy-backed on the connection setup messages. Any authentication messages [challenge/response] would be yet more overhead messages in a TCP-based scheme.) Note also that I don't know how small you can reduce the "state/sequencing" info in TCP and I don't know how big your mbufs are, but in NCS/RPC we can keep literally just (the equivalent of) the state value, connection ID, and sequence number and we're using the generalized user space heap storage package that has few restrictions (e.g., in the granularity of the allocation size). We can keep a LOT of connections' state. >If I assume that both the threads are share the buffer space, then what do >you mean by "finite space for this buffering"? Also, the problem >described above seems more like a problem created by your implementation. It may seem that way, but I think it's not. It all falls out from the requirement of being able to cancel an in-progress call. I believe this makes it problematic to let the thread that's doing the unmarshalling actually be the thread that's reading from the connection. Thus, a layer of buffering is required. There is some fixed number of buffers you will be willing to assign to this task. Once this space is exhausted you have a problem. >Also, this problem should exist for CLTP too. How did you handle that case? RPC/CLTP can always discard data that it's read from the network because the sender never discards the data until the data has been ack'd by the receiver. When the receiver discards data, it doesn't ack it, inducing the sender to retransmit the data. (The details include the fact that the receiver will set his "offered window" value in the ack's it sends to zero.) >>We endeavor to send MTU-sized UDP packets and I don't know why you "hope >>not". > >Because I want good performance and that can happen only by minimizing >transitions between kernel and user land. Choosing a large number is >bad for stormy cases & retransmissions and leads to overrun for machines >with fewer mbuf's or less memory. Choosing a small number for UDP packet >size is bad for performance. Sun decided to go with a compromise - 8K >UDP packet size. I understand the penalties of kernel/user transitions, but I also understand the sending of 8K UDP datagrams to be a dangerous proposition since if any one IP fragment is lost, you are obliged to retransmit ALL the fragments since there's no mechanism for indicating which fragment was lost. >In NCS, is there a mechanism for size negotiation >for faster/bigger machines (lots of memory) which can easily handle >bigger size UDP packets? This is not a machine size/speed issue. It has to do with the network. If there are 5 gateways between the sender and the receiver and they're busy, the odds that one of the fragments will be lost could well be unacceptably high (esp. considering the cost of a loss will be the load of sending ALL the fragments again). I'll leave it for the Internet experts to make some "official" statement (if such a thing is possible), but I've gotten the strong feeling that depending on IP fragmentation is a bad idea. >Just curious - what do you guys do for NFS? Do you use 1K packets only? I'm the wrong person to ask about NFS. I don't have anything to do with it. >Actually, you can easily rest all of my fears by posting some performance >ratios for UDP/TCP (not numbers - I know they are confidential) I don't have numbers right now. >> In any case I take seriously >>my responsibility to produce a real evaluation of the wisdom of this >>approach once I'm in a position to do the evaluation. > >I hope that performance will be one of the criterian. Yes, but it's important to remember that it's only one of the criteria -- -- Nat Mishkin Cooperative Object Computing Operation Hewlett-Packard Company mishkin@apollo.hp.com