Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!ucbvax!geof@decwrl.DEC.COM@imagen.UUCP
From: geof@decwrl.DEC.COM@imagen.UUCP
Newsgroups: mod.protocols.tcp-ip
Subject: Re: When to acknowledge SMTP messages
Message-ID: <8702270038.AA00078@apolling.imagen.uucp>
Date: Thu, 26-Feb-87 19:38:41 EST
Article-I.D.: apolling.8702270038.AA00078
Posted: Thu Feb 26 19:38:41 1987
Date-Received: Sat, 28-Feb-87 06:15:48 EST
Sender: daemon@ucbvax.BERKELEY.EDU
Reply-To: imagen!geof@decwrl.DEC.COM
Distribution: world
Organization: The ARPA Internet
Lines: 99
Approved: tcp-ip@sri-nic.arpa


 >  > The server should NOT make the client wait while a message is
 >  > being delivered...
 >      
 >  I faced this issue when implementing our mail relay.  I decided that the
 >  client SMTP would have to wait while the relay delivered the message.
 >  Otherwise, the relay could acknowledge the message and then crash
 >  or discover that the destination mail server was unable to take the message.
 >  Either way, the mail goes on the floor, hardly desirable.  Acknowledgement
 >  should mean that the message is really okay.

I agree whole-heartedly.  The problem is with SMTP itself.  TCP mandates
that it is the client's responsibility to ensure that the remote client
is up.  In other words, TCP won't probe an idle connection (the old
"keep-alive" discussion), so the higher level protocol must do so if it
cares.  This behavior on TCP's part is necessary to cope with potentially
expensive network paths (e.g., a PTT network that bills by the packet),
so that quiescent TCP connections do not run up big bills.  If you're out of
the office for lunch, you don't want your telnet connection to send
packets around uselessly for an hour or more.  As in most cases, it
doesn't matter much when you're on an Ethernet, but it does in the more
general case.

In the case of SMTP, when a message is terminated with a ".CRLF", no
SMTP data may flow except the server's success/fail response.  Since
the TCP connection is quiescent during this interval, TCP cannot detect
a remote crash.  The only reasonable thing to do is to have SMTP set its
own death timer when it sends ".CRLF" and hope the message can be
delivered during that time interval.

The trouble is that there is no way to judge how long the SMTP death timer
should be.  Some machines deliver mail fast, others not so fast (mine
is just plain slow).  No matter what value you set for the death timer,
you lose some of the time.  And the way you lose is that mail to one type of
host is always lousy.

The ultimate answer would be to fix SMTP, so that the server could still
respond with "OK, I'm still here" messages while it was delivering the
mail.  Given all the SMTP hosts out there, this is probably not going to
happen.  Ad hoc solutions include:

   1. Have the server respond before the message is sent (bad, since messages
      can get dropped on the floor).

   2. Adjust the timeouts to try and accomodate every host you would
      reasonably connect to => every TCP implementation.  This is
      what we do now, and it doesn't work all the time.

   3. Find some random data for the message sender to periodically
      queue.  This would have the effect of taking the TCP connection
      out of its quiescent state, so that the TCP layer can detect
      a machine crash for you.  This works unless the problem is
      that the remote SMTP server is in a tight loop, with the remote
      TCP still healthy (that's a "software bug"-type situation that
      can be detected and fixed).

I favor [3].  Try this:

    When you send ".CRLF":
        set timer for how long you expect this to take (T)
        set timer for how long you are willing to hang (D >> T)
        set noops=0
        wait for input from server
    
    On TIMER T:
        send NOOP<CRLF> command to server
        noops = noops + 1
        set timer to T
        go back to waiting for input from server

    On INPUT:
        process success/fail message from SMTP SEND command
        while noops > 0 do
            read & discard command from server
            noops = noops - 1
            end

    On TIMER D:
        assume failure of message.

The idea is that by sending NOOP commands, the TCP layer will
probe the underlying connection for you.  Thus, the ultimate
timer, D, can be VERY long, since it detects bugs in the remote
SMTP, not random events.  The annoyance is that you have to
ignore enough responses to match each noop you sent (I guess
the other annoyance is that it is a miserable hack that should
be shot at sunrise...).

An obvious enhancement is to query the local TCP before sending
a NOOP -- it is not necessary to send anything unless the local
TCP is quiescent.  This is extremeley useful in the situation
where the SMTP connection is dribbling along at 1200 baud somewhere
and the REAL problem is that the message hasn't been TRANSMITTED yet.

The timer T should be long enough to give the other machine a
running shot at delivering the message in that time (say 1-5
minutes).

- Geof