Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!ucbvax!cs.brown.edu!jb
From: jb@cs.brown.edu.UUCP
Newsgroups: mod.protocols.tcp-ip
Subject: Re: Domain host TTL fields
Message-ID: <8702262115.AA02105@ucbvax.Berkeley.EDU>
Date: Thu, 26-Feb-87 14:04:02 EST
Article-I.D.: ucbvax.8702262115.AA02105
Posted: Thu Feb 26 14:04:02 1987
Date-Received: Sat, 28-Feb-87 02:54:55 EST
Sender: daemon@ucbvax.BERKELEY.EDU
Distribution: world
Organization: The ARPA Internet
Lines: 54
Approved: tcp-ip@sri-nic.arpa

Over time, my idea of what the optimum time should be has been increasing.
In general, I feel that 24 hours is about the correct value.  One major
issue is how long various other software will wait for a change.  Sendmail
will attempt to deliver a message for 3 days (as distributed).  One would
like to have any changes seen in less than 3 days.

There are a couple reasons for data to change.  First, a planned change to
the network configuration.  This can be planned for in advance by reducing
the TTL.  Don't forget that the reduction must be made at a time longer
than the TTL in advance.  Consider how long in advance you would be planning
a move.  Another reason for a change is due to an unanticipated failure.
If one of your primary machines (such as a mail forwarder) goes down
for a few days, attempts to bypass the failure require the length of the
TTL to be fully realized.

Coming from Berkeley and being involved with some of the early distributions
of BIND, I'll admit we made a mistake in what we had in the sample files.
Many people just copied our samples and did not analyze the situation.  Our
samples should have had TTL's that were longer than 1 hour.  We did not
realize this originally ourselves and were guilty of using too short
of a TTL for a long time.  These problems take time to work out.

As far as the question of what should be used as the timeout waiting for
a reply, I'm not sure of what is the correct answer.  There are 3 timeouts
to consider in this case.  First, total time to wait for any response before
indicating a failure.  Second, the time between trying different servers
for the domain.  And third, the time between tries to the same server.

The first of these is a user interface question on one hand, and a performance
issue on the other.  How long should a user who tries to telnet to some host
have to wait before being told that the host is unknown (possibly only
temporarily)?  I don't like to wait a long time, but on the other hand,
the longer the wait the more likely to succeed.  BIND is currently using
about one minute for this.

The other two are intertwined and also are a part of the first one.  UDP
which is used primarily for queries is not reliable.  If one knows that
the original packet was lost, then a retry to one of the servers is in
order.  If the delay is in network round trip time (RTT), then the time
between the retries should be lengthened.  

To decide what these times should be, several questions to be answered.
How long should the user wait for a response?  How many queries total
should be sent out in trying to resolve the name?  How many queries should
be made to each server for the domain?  What should the retry algorithm
be (linear, exponential, something else)?  If recursion is being done
by another process, how does that affect these values?

I'm not sure what is being used in BIND at the moment.  It actually uses
two different algorithms.  One for talking to the local server, and another
for dealing with recursion.  Some work on the algorithms has been done for
the most recent release and I haven't had a chance to look at the code.

					Jim Bloom