Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!decwrl!pyramid!pesnta!hplabs!ucbvax!SCRC-QUABBIN.ARPA!DCP From: DCP@SCRC-QUABBIN.ARPA.UUCP Newsgroups: mod.protocols.tcp-ip Subject: Re: Adaptive SMTP Timeouts Message-ID: <860603104027.9.DCP@FIREBIRD.SCRC.Symbolics.COM> Date: Tue, 3-Jun-86 10:40:00 EDT Article-I.D.: FIREBIRD.860603104027.9.DCP Posted: Tue Jun 3 10:40:00 1986 Date-Received: Wed, 4-Jun-86 17:05:00 EDT References: <860602-175604-178@Xerox> Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 41 Approved: tcp-ip@sri-nic.arpa Date: Mon, 2 Jun 86 12:03:42 PDT From: Murray.pa@Xerox.COM An idea that we have found very helpful... Our mailer keeps outgoing mail sorted by host. Hosts are split into two categories: healthy and sick. While there is work to do on the healthy queue, the mailer ignores the sick hosts. Whenever the mailer empties the healthy queue, it tries the host on the front of the sick queue. (If that fails, it gets moved to the end of the sick queue.) The idea is to avoid having the mailer bang its head against hosts that are known to be causing trouble. We do something like this. We keep track of up and down hosts, and when processing mail skip the hosts that are believed down. Therefore, we don't concentrate on one particular host (which might drive that host crazy). I think we do it this way because one message is often destined for many hosts. Occasionally mail to a host that isn't really very sick takes much longer that we would like. This happens when the sick queue is very long and the mailer is busy so the sick queue doesn't turn over very fast. So far, this hasn't bothered us enough to do anything about it. Obvious solution: Periodically declare sick hosts up, or slightly more conservatively, declare the host suitable for a probe. If it really is sick, you'll know soon enough. You only have to do this for one message. If it isn't sick, you can requeue the tardy messages. Along the same lines, we also keep mail to a host sorted, but not quite chronologically. Whenever the mailer tries to send a message and fails, that message gets moved to the end of the queue. Occasionally, this lets the rest of the mail get through when one particular message is having/causing troubles. When a message is causing troubles, how long does it take a human to realize it and take corrective action. If it stayed at the head of the queue, I can imagine a human would notice sooner by either having no mail get through at all, or the queue for the troublesome host keeps growing instead of stays at some "respectable" number.