Path: utzoo!attcan!uunet!lll-winken!ames!pasteur!ucbvax!tut.cis.ohio-state.edu!rutgers!att!ulysses!mhuxo!mhuxu!alux2!matthews From: matthews@alux2.ATT.COM (John Matthews) Newsgroups: comp.protocols.tcp-ip Subject: NFS Performance through Routers Message-ID: <237@alux2.ATT.COM> Date: 18 Mar 89 21:30:56 GMT Reply-To: ulysses!aloft!matthews@princeton.edu (John Matthews) Organization: Laboratory 5223 Lines: 76 Last week we replaced a DEC Lan Bridge with a new Proteon P4200 router to create a local subnet for our building. Ever since, things have been running extremely slow for the people that get their CAD software through our gateway. They do rely heavily on huge CAD executables that get sent through the gateway. I have been looking into this for quite some time and I am finally posting a message here to see what other people have done in similar situations. What I have found is that default mounts in NFS make reads and writes that are 8192 bytes long. The kernel gets these and then in turn fragments these requests into up to 9 UDP packets. If the proteon discards even one of these packets, all 9 of them have to get retransmitted. I went around and changed all of the NFS mounts to do 1024 byte reads and writes. This seemed to improve things a little. Another thing that I have noticed is that we are getting extremely high collision rates on the SUNS. They add up to about a million and a half for the past week. Someone told me that the SUNS don't abide by a standard that says they should wait 10 milliseconds between each packet they send to give others a chance to transmit. They told me they only wait 1 millisecond, if that. Could this be causing alot of collisions? There are only around 15 Sun clients and one server sitting on each of two bridged ethernets in the building where they are having all of the problems. In the main building we have things set up the same way with collisions adding up to around 40,000. There does seem to be alot of broadcasting going on that ethernet that could cause this. There is a problem stemming from the fact that older versions of UNIX are trying to forward IP broadcast packets. When these hosts receive a broadcasted RIP packet addressed to 128.94.255.255, they think it's a packet destined to a specific machine and they then try and forward it. For every such packet, an ARP request is broadcasted on the ethernet. There are about 16 machines running the old network software and 5 routers generating up to 5 rip packets every 30 seconds. I believe that added up to around 28,000 broadcasts per hour. Temporarily, I answered these ARP requests and pointed them to a device that would ignore them, but the network is still slow. Is there anything wrong with responding to these ARP requests with an ethernet address that doesn't really exist on that network. Then the machines running the old network software would just forward it into a black hole. Am I thinking right or would this cause problems? What will the DEC Lan Bridges do with an ethernet packet when it has no idea which side that ethernet device is really on. Will every bridge throughout the network pass this packet everytime it's sent? Last night we tried to configure an extra ethernet board on the fileserver that houses all of the CAD software and connect it to the other ethernet cable to give them back some speed. All we did was uncomment the ie1 interface in the kernel config file, recompile the kernel and reboot. We didn't change any of the /etc/*rc* files at all. When the sun came back up, all of the old NFS mounts on the clients just timed out. The NFS deamons wouldn't service any NFS requests. I was able to use telnet and rlogin to connect to hosts on either side after manually ifconfig'ing the new ie1 interface. I gave up and tried it on another fileserver. It did the exact same thing. The thing that doesn't make sense is that the only thing we did was add one ethernet device to the kernel and then nothing worked the way it used to. We rebooted on the old kernels and everything was back to normal. We called SUN but they didn't seem to know what the problem was. Has anyone else ever encountered such problems? We are eventually going to move some of that software to servers in that building so that they aren't pounding on the gateway. I wasn't aware that the proteon had such little bandwidth compared to a LAN bridge. How on earth can they go from Pronet-80 to ethernet when they can't come close to handling ethernet's full 10 megabits/s? What percent of 10 mbits/s can a proteon really route from one ethernet to another? Has anyone done some real life performance testing? What other things could I do to optimize NFS traffic? If there are things that I am wrong about, please let me know. This has been a frustrating week to say the least. If anyone could spare a few minutes on the phone, please e-mail me your phone number. I'd really appreciate it. John Matthews ulysses!aloft!matthews@princeton.edu matthews@aloft.att.com matthews@research.att.com