Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!gem.mps.ohio-state.edu!tut.cis.ohio-state.edu!ucbvax!CAEN.ENGIN.UMICH.EDU!pha From: pha@CAEN.ENGIN.UMICH.EDU (Paul H. Anderson) Newsgroups: comp.sys.apollo Subject: Re: Apollo SR10 Registry Performance Message-ID: <46574bed8.000f088@caen.engin.umich.edu> Date: 20 Oct 89 14:11:17 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 97 From: marmen@ucbvax.Berkeley.EDU Message-Id: <114@bnrgate.bnr.ca> Subject: Apollo SR10 Registry Performance Our site has approximately 440 Apollos on ethernet running sr10.1. The Apollos are setup to run BSD4.3 unix exclusively. AEGIS is not installed. We are experiencing some difficulties with registry nodes being excessively overworked and occassionally melting down. Apollo is recommending that we increase the number of registry servers. However, they cannot recommend what the proper ratio should be, nor can they tell us if the severe performance degradation will drop to an acceptable level. Word about the poor registry performance has gotten out to the users and they are refusing to have registries placed on their machines. Has anyone experienced this registry degradation? If so, what was the solution? Finally, what ratio of registry servers to nodes are you running? Our ratio is approximately 1/45. If I can get data on the proper ratio, then I'll be able to convince the users to have registries placed on their nodes. cheers, rob... -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- | Robert Marmen marmen@bnr.ca OR | | Bell Northern Research marmen%bnr.ca@cunyvm.cuny.edu | | (613) 763-8244 | Here at UM, we are getting towards the end of switching over 500 nodes to SR10.1, and have dealt with a number of unusual problems in the transition. 1) the SR10 print server eliminated some optimizations for printing of bitmaps. While Apollo apparently intends to replace the lost speed, the stock 10.1 prsvr basically is quite slow. For what it is worth, Mentor Graphics, and probably other companies as well, haven't really caught up with the new NCS based printing scheme, so vendor's packages such as Mentor have had some problems with printing. 2) We broke the cvtrgy program (for converting sr10 registry to sr9 registry) by going over about 5600 accounts, which created a full_names file that was too large, and as a result broke one of the libraries on the 9.7 machines. This was fixed, but the patch won't show up until a later release than the stock 10.1 tapes. If you see a failure of this type (with large # of accounts), call Apollo tech support - they already have the fix. 3) Because the registry is now server based, rather than filesystem based (a really excellent idea!), strange things can happen that don't really show up very well, except as unstable load on the servers. In particular, if you have a single registry, and think that adding one additional slave will halve the load on the master, then think again. If one or the other becomes unavailable, the clients will time out, then switch to the other one, and not switch back until that choice times out. So what you tend to get is the entire network load switching from node to node. The solution is to make sure that you add enough servers to truly balance the load by offering a choice of remaining servers even if one goes down. In my opinion, three servers is not enough, even for only 100-150 nodes. Four servers may be a little marginal, and more would be better. After getting four or five servers, there is less need for additional ones, since as I mentioned before, the load will tend to be balanced better. Someone at Apollo mentioned something like 1 server for every 150 SR10 nodes, which is something I can agree with, but for the first 150 nodes, you probably want to see at least four servers. The performance of the registry depends largely on the offered load. In our environment (lots of student labs), we see a fair amount of registry activity, due to the dynamicly changing load caused by students moving between nodes at random (and doing it often - you can expect complete lab turnover in the 10 minutes between classes). So... if you are in a more stable environment, the servers will see less load. If you have a large registry, however, like we do (thousands of accounts), you can expect the server to pretty much eat up a DN3000 with 4 megs. A DN4000 with 8Megs can be used interactively, but is somewhat slower than I would like. But keep in mind that what I would really like is a desktop DN10000 (hint for Randy, my boss!). 4) The registry also had a problem that we are still working on, where creation of large numbers of accounts (300 per day) bogs the registry down to the point where we can do only a few transactions per minute. Since this may be related to problem 3) above, and because we can no longer easily test it, we expect to have to wait until next term before we get a high number of new accounts. Good luck! Paul Anderson CAEN Apollo Systems Programmer University of Michigan