Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp Path: utzoo!watmath!clyde!bonnie!akgua!sdcsvax!dcdwest!ittvax!decvax!decwrl!sun!mojo From: mojo@sun.uucp (Joseph Moran) Newsgroups: net.unix-wizards Subject: Re: File system limit in 4.2 BSD Message-ID: <2089@sun.uucp> Date: Sun, 31-Mar-85 14:14:08 EST Article-I.D.: sun.2089 Posted: Sun Mar 31 14:14:08 1985 Date-Received: Wed, 3-Apr-85 01:40:54 EST References: <681@rayssd.UUCP> Reply-To: mojo@sun.UUCP (Joseph Moran) Distribution: net Organization: Sun Microsystems, Inc. Lines: 42 Keywords: cmap, Fastreclaim Summary: changing cmap screws vax Fastreclaim code In article <681@rayssd.UUCP> dhb@rayssd.UUCP writes: >Has anyone ever successfully gotten more than 15 file systems on a 4.2 BSD >system? After many long delays, we are finally going to convert from 4.1 >to 4.2, and we need to be able to mount more than 15 file systems. I tried >making the same changes that I made in 4.1 (increase the size of mdev in the >cmap stucture, increase NMOUNT and NSWAPX in param.h, fix mount/umount) but >it doesn't seem to work. I even talked to Mike Karrels in Dallas and he >indicated that that was all I had to do. The problem we are experiencing >is that random processes dump core at random times. This can be very >annoying if the shell core dumps, and it can be disastrous if "init" core >dumps. The behaviour seems to indicate some kind of swapping error. At >first I didn't even associate this problem with the changes to the coremap >structure but in a final act of desperation I backed off the change and >now the system runs fine. We have been trying to track what we thought >was a weird swapping error for three months (tues and wed eve.) and have >now been running smoothly WITHOUT the coremap changes for over two weeks. > > ... Your problem is the "Fastreclaim" code in vax/locore.s. This code is an optimization put into 4.2. This code knows about the cmap structure. If you change anything in the cmap structure w/o rewritting this code, you are bound to get bad paging problems. As it turns out, you can take out the call to Fastreclaim as it is simply an optimization, in the long run you'll want to rewrite the code for your new cmap structure. It turns out that this code also knows a few other magic numbers also, w/o using the right symbols to reference them (like UPAGES). The second problem can be avoided by figuring out some of the magic numbers in the code and putting in an expression using the right symbols. It turns out that we were bit by this same problem here at Sun twice. We changed the cmap structure for use with the nfs (network file system). We had a hard time figuring out why random pages got paged in incorrectly and processes were dying when we were running the nfs kernel until it was tracked down to Fastreclaim. Later we were playing with changing UPAGES and got bit by Fastreclaim again. Sometimes changing .h files doesn't do everything it really needs to. Hats off to Bill Shannon for finding both of these. Joe Moran sun!mojo