Path: utzoo!attcan!uunet!cs.utexas.edu!rutgers!soleil!mlb.semi.harris.com!thrush.mlb.semi.harris.com!del
From: del@thrush.mlb.semi.harris.com (Don Lewis)
Newsgroups: comp.protocols.tcp-ip.domains
Subject: Re: BIND bug list
Message-ID: <1990May30.094653.8584@mlb.semi.harris.com>
Date: 30 May 90 09:46:53 GMT
References: <1990May17.083447.6880@mlb.semi.harris.com> <25358@netnews.upenn.edu>
Sender: news@mlb.semi.harris.com
Organization: Harris Semiconductor, Melbourne FL
Lines: 56

In article <25358@netnews.upenn.edu> hagan@DCCS.UPENN.EDU (John Dotts Hagan) writes:
>
>Anyways, it think it would be real neat of the resolver did some kind of
>performance/reliability remembering when going at its list of possible name
>servers to use.
>
>As it is now, we have three name servers for our campus (one is primary, and
>two secondaries).  One of the secondaries is listed first in everyone's
>resolv.conf (or equivilent list of servers), and then the primary, and then
>the second secondary.
>
>When the first listed secondary dies (either named dumps core and leaves, or
>the system is toasted), everyone's resolver gets slow since it patiently tries
>to query the first listed name server, then after a timeout moves on the the
>next one, and so forth.  However, it does not remember that it just had some
>trouble with the first server, and tries it again for the next request.

You might want to list each of these first in one third of the hosts in
order to better distribute the load.  This way, only 1/3rd of the hosts
will slow down when one of the servers dies (but this will happen three
times as often).

>
>It would be great if the first user who tries a telnet (or whatever) suffered
>the hit once for that host, then other tries would quickly just go at a working
>name server.  Perhaps dead name servers could be routinely retried and some
>stats kept on them (I think bind already does this sort of thing when dealing
>with the list of root servers, so at least there is some precedent for this
>kind of behavior).
>
Well, there is sort of a problem here.  You probably have no such thing
as *the* resolver.  Everything that you run that wants to do host<->address
translation uses the resolver library routines and is a separate instance
of a resolver.  Each time you fire up telnet, it starts up from scratch
and has no history available concerning the status of the various servers.
If a particular process does a lot of host<->address translations, then it
probably could figure out what was going on and make use of this
information, but if it only does one translation, by the time it figures
out which server is the hot one to use, it has no further need to use it.
I suppose that you could read this information from a file and update it,
but then you have to be able to handle multiple simultaneous accesses and
updates to this file 8-(

If you are running a somewhat modern BIND (>4.8?), then you can run it
on each host and configure it to forward all its queries to the campus
servers.  BIND appears not to keep track of the performance of its
forwarders, so I suppose that would be better if it did something like
what it does for the root servers.  Running BIND on each host also has
the advantage that the answers to frequently asked questions are cached
locally on the host which will reduce the load on the campus servers.
Be forwarned that the forwarding stuff doesn't quite work right even in
4.8.1.  Hopefully there will be a cleaner release soon.
--
Don "Truck" Lewis                      Harris Semiconductor
Internet:  del@mlb.semi.harris.com     PO Box 883   MS 62A-028
Phone:     (407) 729-5205              Melbourne, FL  32901