Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!bionet!csd4.milw.wisc.edu!lll-winken!gauss.llnl.gov!casey From: casey@gauss.llnl.gov (Casey Leedom) Newsgroups: comp.protocols.tcp-ip Subject: Re: Domain Name Screaming Message-ID: <28043@lll-winken.LLNL.GOV> Date: 4 Jul 89 22:40:43 GMT References: <8906292105.AA07216@fornax.ece.cmu.edu> <37397@sgi.SGI.COM> Sender: usenet@lll-winken.LLNL.GOV Reply-To: casey@gauss.llnl.gov.UUCP (Casey Leedom) Organization: Lawrence Livermore National Laboratory Lines: 70 | From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver) | | 1) some program decides to do gethostbyname(foo.bar) or | gethostbyaddr(1.2.3.4), checks with portmap & ypbind, and sends | an rpc request to the correct ypserv. | 2) ypserv gets the request, fails to find the key in the YP map, and | since YP-to-DNS is turned on, forks a child which does an DNS | lookup. | 3) the link to the DNS root or correct authorative server is down or | congested, so the child does not get an answer for a while. | 4) meanwhile, the original program in step #1 is waiting for the answer. | If step #3 takes long enough, the original program does a normal | YP-rpc timeout, retries, and everything is repeated from step #1 | | This is worse than it looks because the time-out in step #4 is less than | the one in used by the child in step #3. One can get large numbers of | children of ypserv, all asking the local DNS server for the same answer. | | Some programs, ypmatch may be one, seem to try forever. This would | generate an unbounded, linearly increasing amount of DNS traffic, except | that one usually runs out of resources for the local nameserver and | ypserv parent. Vernon has described the situation accurately. Most of the time the application in question is telnet, ftp or some other user instigated application. When this happens the user usually gets tired of waiting after a while and aborts. This causes a minor transient load on the network and the machine running the YP server, but usually nothing you couldn't live through. One application in particular doesn't get tired though: sendmail. For every piece of mail you have queued up, there will be a sendmail waiting for address resolution. In the stock versions of the Sun OS 3.X, sendmail will hang forever. (Note that it really isn't sendmail, but rather the gethostbyXXXX(3) library routines.) In any case, when I ran into this problem in November of 1987 I talked to Bill Nowicki at Sun about it who was their current sendmail guru (and probably still is) and he gave me two new sendmail binaries. Either one solves the problem described above. The first binary is pretty much identical to the standard sendmail binary, but includes timeouts around all the gethostbyXXXX calls. He set the timeout to 90 seconds which I think is way too long, but it gets the job done. If a timeout occurs, sendmail just leaves the mail queued up assuming a temporary delivery failure. It will return the mail after the normal three days of trying if it can't deliver it in that time. The second binary is much more interesting however. It completely bypasses YP and goes directly for the name server itself. Thus, you get MX support! The only thing you have to do to run the MX sendmail is include the file /etc/resolv.conf if the name server isn't running on the local host. If anyone wants, I have both Sun2 binaries for SUN OS 3.X. (The binaries will run fine on a Sun3 - trust me, I've been running them for a year and a half now.) They are available via anonymous ftp from lll-crg.llnl.gov under llnl/named/sun: % ls -l ~ftp/llnl/named/sun total 461 -r--r--r-- 1 root 1214 Jan 3 1989 nslookup.help -rw-r--r-- 1 root 786 Dec 1 1987 sun.rc.local.diff -rwxr-xr-x 1 ftp 172032 Dec 1 1988 sun2.sendmail* -rwxr-xr-x 1 ftp 196608 Dec 1 1988 sun2.sendmail.mx* -rwxr-xr-x 1 root 81920 Jan 3 1989 sun3.nslookup* lrwxr-xr-x 1 root 13 Dec 24 1988 sun3.sendmail@ -> sun2.sendmail lrwxr-xr-x 1 root 16 Dec 24 1988 sun3.sendmail.mx@ -> sun2.sendmail.mx Casey