Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!ncar!woods From: woods@ncar.ucar.edu (Greg Woods) Newsgroups: comp.protocols.tcp-ip.domains Subject: BOGUS ROOT SERVERS!! Message-ID: <9163@ncar.ucar.edu> Date: 14 Nov 90 00:18:40 GMT Reply-To: woods@ncar.UCAR.EDU (Greg Woods) Organization: Scientific Computing Division/NCAR, Boulder CO Lines: 48 (This first started last Wednesday and has continued through this morning) We are having a serious problem with our name servers that APPEARS to be related to bogus root server data that is coming in from God knows where. Our configuration is that we have a primary server (ncar.ucar.edu a.k.a. handies.ucar.edu) which is the known server for our domain and is queried from the outside. We also have a server unknown to the outside which is configured as a secondary and is used as the forwarder by most of our internal machines (some internal machines still forward to the primary due to inertia; I don't control every server here so some of them are slow to change over as I have asked). (In case it matters, both are BIND 4.8.2, the primary is a Sun 4/280 running Sun OS 4.0.3, and the secondary is a Microvax II (a.k.a. boat anchor :-) running Ultrix 3.1) What happens is that when a machine is rebooted (or named is restarted), it goes into an infinite loop burning tons of CPU time and refusing to answer queries. It also ignores all signals (except 9, of course) which makes debugging a real pain. Empirical evidence shows that every time this has happened, I find the following bogus root servers in both the primary and secondary servers' caches: (root) nameserver = MTECV1 (root) nameserver = TELECOM (root) nameserver = NEXTSVR These appear with no domains and with no corresponding A record which I suspect may be the root of the problem (pun not intended, I swear). If this junk is NOT in the cache, then name servers using one of these as the forwarder can be started fine. If this junk *is* present, then killing and restarting first the primary and then the secondary (which of course removes the junk) will allow other servers here to be restarted. Occasionally I also see "lbl.gov" show up as a root server, but if it is there without these other three, it does not seem to cause the problem to occur. It occurs to me that the probable reason for that is that lbl.gov is a legitimate name that can be looked up and an A record eventually found, even if it isn't really a root server. Has anyone else seen this? Does anyone have any idea what the &^$%#@! is going on? I am familiar with how the DNS works on an administrative and conceptual level, but I am not familiar with BIND on a source code level, nor does the rather cryptic output you get when you turn debugging on make a whole lot of sense to me (the latter is a consequence of the former, I expect). Before I dive into the source code, I'd like to ask: is there any reason why data about the root domain coming in from outside should EVER be believed and cached? Has anyone patched BIND to disallow this? Will I break the entire DNS if I do this here? :-) --Greg