Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!ucbvax!geof@decwrl.DEC.COM@imagen.UUCP From: geof@decwrl.DEC.COM@imagen.UUCP Newsgroups: mod.protocols.tcp-ip Subject: Re: When to acknowledge SMTP messages Message-ID: <8702270038.AA00078@apolling.imagen.uucp> Date: Thu, 26-Feb-87 19:38:41 EST Article-I.D.: apolling.8702270038.AA00078 Posted: Thu Feb 26 19:38:41 1987 Date-Received: Sat, 28-Feb-87 06:15:48 EST Sender: daemon@ucbvax.BERKELEY.EDU Reply-To: imagen!geof@decwrl.DEC.COM Distribution: world Organization: The ARPA Internet Lines: 99 Approved: tcp-ip@sri-nic.arpa > > The server should NOT make the client wait while a message is > > being delivered... > > I faced this issue when implementing our mail relay. I decided that the > client SMTP would have to wait while the relay delivered the message. > Otherwise, the relay could acknowledge the message and then crash > or discover that the destination mail server was unable to take the message. > Either way, the mail goes on the floor, hardly desirable. Acknowledgement > should mean that the message is really okay. I agree whole-heartedly. The problem is with SMTP itself. TCP mandates that it is the client's responsibility to ensure that the remote client is up. In other words, TCP won't probe an idle connection (the old "keep-alive" discussion), so the higher level protocol must do so if it cares. This behavior on TCP's part is necessary to cope with potentially expensive network paths (e.g., a PTT network that bills by the packet), so that quiescent TCP connections do not run up big bills. If you're out of the office for lunch, you don't want your telnet connection to send packets around uselessly for an hour or more. As in most cases, it doesn't matter much when you're on an Ethernet, but it does in the more general case. In the case of SMTP, when a message is terminated with a ".CRLF", no SMTP data may flow except the server's success/fail response. Since the TCP connection is quiescent during this interval, TCP cannot detect a remote crash. The only reasonable thing to do is to have SMTP set its own death timer when it sends ".CRLF" and hope the message can be delivered during that time interval. The trouble is that there is no way to judge how long the SMTP death timer should be. Some machines deliver mail fast, others not so fast (mine is just plain slow). No matter what value you set for the death timer, you lose some of the time. And the way you lose is that mail to one type of host is always lousy. The ultimate answer would be to fix SMTP, so that the server could still respond with "OK, I'm still here" messages while it was delivering the mail. Given all the SMTP hosts out there, this is probably not going to happen. Ad hoc solutions include: 1. Have the server respond before the message is sent (bad, since messages can get dropped on the floor). 2. Adjust the timeouts to try and accomodate every host you would reasonably connect to => every TCP implementation. This is what we do now, and it doesn't work all the time. 3. Find some random data for the message sender to periodically queue. This would have the effect of taking the TCP connection out of its quiescent state, so that the TCP layer can detect a machine crash for you. This works unless the problem is that the remote SMTP server is in a tight loop, with the remote TCP still healthy (that's a "software bug"-type situation that can be detected and fixed). I favor [3]. Try this: When you send ".CRLF": set timer for how long you expect this to take (T) set timer for how long you are willing to hang (D >> T) set noops=0 wait for input from server On TIMER T: send NOOP command to server noops = noops + 1 set timer to T go back to waiting for input from server On INPUT: process success/fail message from SMTP SEND command while noops > 0 do read & discard command from server noops = noops - 1 end On TIMER D: assume failure of message. The idea is that by sending NOOP commands, the TCP layer will probe the underlying connection for you. Thus, the ultimate timer, D, can be VERY long, since it detects bugs in the remote SMTP, not random events. The annoyance is that you have to ignore enough responses to match each noop you sent (I guess the other annoyance is that it is a miserable hack that should be shot at sunrise...). An obvious enhancement is to query the local TCP before sending a NOOP -- it is not necessary to send anything unless the local TCP is quiescent. This is extremeley useful in the situation where the SMTP connection is dribbling along at 1200 baud somewhere and the REAL problem is that the message hasn't been TRANSMITTED yet. The timer T should be long enough to give the other machine a running shot at delivering the message in that time (say 1-5 minutes). - Geof