Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!ucla-cs!rutgers!njin!brisco
From: brisco@pilot.njin.net (Thomas P. Brisco)
Newsgroups: comp.protocols.tcp-ip.domains
Subject: Re: Experimental DNS RFC (another one: CIP))
Message-ID: <Apr.16.01.29.15.1991.28941@pilot.njin.net>
Date: 16 Apr 91 05:29:16 GMT
References: <9104082258.AA26036@aggie.ucdavis.edu> <9104140321.AA05594@jessica.stanford.edu>
Distribution: inet
Organization: NJ InterCampus Network, New Brunswick, N.J.
Lines: 149


In <9104140321.AA05594@jessica.stanford.edu>, almquist@JESSICA.STANFORD.EDU ("Philip Almquist") says
[...]
>	The purpose of the CIP record is to allow a generic name to
>refer to multiple, functionally equivalent hosts.  When a DNS server
>receives request for the address of such a generic name, it synthesizes
>an A record for the generic name giving the address of one of the real
>machines.  There are various ways that the DNS server could conceivably
>choose which of the real machines to return information about.  The
>proposal decrees that the server should use a particular mechanism, a
>weighted round robin scheme.
>
>	Something that makes the CIP record rather extraordinary is that
>it is basically just a directive telling the server how to function.  A
>server will not normally includes a CIP record in any response that it
>sends.  This suggests to me that there might be alternative, non-
>protocol mechanisms which accomplish the same purpose.

    The proposal doesn't explicitly name a round robin, only that
in my first implementations that it was used.  The weighting is
the more salient aspect of the RR.

    On the second paragraph, the key is that the servers _will_
pass the cluster information on, however only under the correct
conditions.  The information will be passed onto secondaries,
etc.  I prefer to think of it as a "polymorphic" record - it
seems to change slightly depending on the method of access. 
Sometimes it looks like a CNAME (in additional processing),
sometimes it looks like a MX (weighting) or sometimes a RR
unto itself.  The CIP doesn't impose a whole lot of reasoning
upon the RR - the records aren't returned based upon some dynamic
knowledge of the system (indeed, that belongs in a special purpose
nameserver).  It is a "lightweight" version of what you speak.
Sometimes the distribution of resources is sufficiently grey so
that only an approximation of the loading is necessary (is a 
machine loaded because a lot of processes are disk bound? cpu 
bound? number of logins are at a maximum?) or beneficial.

>	Indeed, the problem of how to have a generic name for multiple
>equivalent hosts was addressed by the DNS Working Group some time ago.
>The group found that no new record types were needed or desirable.  CMU
>(and probably other places) already do just what the author of the CIP
>proposal wants to be able to do, without any extensions to the DNS
>protocol.
>
>	How do they do it?  They delegate authority for the generic name
>to a special nameserver.  When that special nameserver gets a request
>for the address associated with the generic name, it creates and returns
>an A record claiming that the address associated with the generic name
>is the address of one of the real hosts.  (Actually, there is no real
>reason to have the special nameserver be separate from the regular one,
>except that it simplified implementation).

    However, there is no way of passing around the knowledge that
some name is actually a cluster, and not a single address.  Suppose
that, at your site, you run about 5 secondary nameservers (it 
seems you have geographically noncontiguous campuses) and need
serveral nameservers to act autonomously, but each still hand
out addresses in such a way that some level of "load sharing"
occur over a series of hosts.  There is no defined way of dispersing
the "clusterness" (gak) of a group of machines.

    In fact, you could replicate an entire second set of "load 
knowledgable nameservers" (sentient, in my terms), around your
campuses.  But then, there is a lot of files to update, a lot
of extra daemons, etc, etc.

    Note; I don't rule out the fact that a load knowledgeable
nameserver couldn't utilize the CIP records from a non-sentient
nameserver.  I would hope that people would use this aspect of
them (asking for a type "any").  Using the CIP records, a
sentient server could differentiate between series of clusters,
and single clusters.  Some additional language would be necessary
in order to tell a sentient nameserver where one cluster ends
and the next begins.  Assuming that a site has multiple clusters,
it would be nice to have only one nameserver handing out addresses
for a series of clusters (the nameserver could be sentient or
non-sentient).  Introducing some new method of indicating where 
clusters began or ended would be somewhat clumsy, and prone to 
errors. 

    Additional nameservers which cache initial replies are going
to defeat the distribution of tasks amongst the members of the
clusters.  Remote nameservers need to be told to act slightly
different with this address (hence the CNAME with a low TTL,
but an A record with a normal TTL).  As much information as
is possible is cached with the remote nameserver, however
some information will have to be retrieved every time.  Authoritative
secondaries (in this scheme) can have at least an approximation
at the load sharing that is going on, while handing out records
to local sites - further minimizing unnecessary traffic.

>	How does the special server decide which address to return?  It
>could return the address of the machine with the lowest load average.
>It could return the addresses using a weighted round robin scheme (in
>which case, it's configuration file could even contain things that look
>like CIP records).  Or it could do something else...  The point is that
>the answer to the question in the first sentence is what the OSI people
>call a "local matter".

    Again, how does the nameserver share the concept that the
hosts are "clustered" - logically one?  I'd like my cluster backed
up, far away, preferably by my secondary.  And when my secondaries
hand out information about my clusters, I'd like them to be
honored.

>	Does the CIP mechanism have any advantage?  Yes, there's a small
>one.  Essentially, it standardizes the configuration information about
>generic names sufficiently that the config files are portable to other
>implementations of the mechanism.  Because the config files also happen
>to be zone files, the config information can also be zone transferred to
>other implementations.  However, I believe that the costs of the CIP
>proposal outweigh its benefits.  The standardization of the config
>information for generic names is achieved at the cost of requiring that
>anyone using the mechanism has to use weighted round robin (or else
>forget about CIP and use the current mechanism).  Additionally, the CIP
>proposal would have to be implemented, whereas the current mechanism is
>already implemented.

    No, round robin, in fact, a later release implemented something
more efficient.  I don't believe consistent syntax to be a small
one - I'd like to have all of my nameservers handing out this 
information, and I'd prefer to not hack it in.  In most cases,
all is needed is a reasonable approximation at load sharing, but
many people need to do it for a lot of "clusters".  The ability
to cleanly indicate logical equivalence of a series of hosts
versus "magic domains" shouldn't be trivialized.

>	I can't speak for the DNS Working Group, but my own opinion is
>that the CIP record would bloat the standard for little if any real
>gain.  If an RFC is to be published on the topic, it should instead
>describe the currently used mechanism (and perhaps note where the
>existing implementations may be obtained).
>							Philip

    I have to admit that the CIP record is unusual, however
I do feel that it addresses a real need.  A zone can mean a lot
of things - a workgroup, an administrative unit, a delegation
of authority - but a cluster can only be one thing, and that
it should be treated slightly differently.


						    Tp.
-- 

...!rutgers!brisco (UUCP)               brisco@pilot.njin.net (ARPA)
    brisco@ZODIAC (BITNET)              908-932-2351          (VOICE)

Just say "Moo"