Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!gatech!purdue!decwrl!shelby!unix!hplabs!hpfcdc!hpldola!hp-lsd!col!bdale From: bdale@col.hp.com (Bdale Garbee) Newsgroups: comp.sys.hp Subject: Re: Experience sought with large HP 9000 clusters Message-ID: <2220005@col.hp.com> Date: 27 Jun 89 00:13:59 GMT References: <3517@cps3xx.UUCP> Organization: HP Colorado Springs Division Lines: 65 >I would like to hear from anyone who has done this sort of thing >with this number of machines. We have several clusters with a lot of machines on them, where "a lot" is defined as 16 or so. I'll try to comment a bit. Please recognize that I am speaking from personal experience, not as a representative of HP... I write instrument firmware for a living... >The cluster servers will be 9000/360's with 12mb of memory and a fast >and a slow SCSI interface. Not bad. We tend to gravitate towards 350's and 370's as servers because of a perception that the split bus architecture allows more DMA throughput to I/O devices than on the 360. Perhaps someone more authoritative will comment on whether this is true or not. We always configure servers with ECC RAM. Even though it costs more, and parity errors are scarce, when one does happen on the server the whole cluster is toast until it reboots. They are rare enough, that in your environment this may be a don't care. Here, it's a nightmare... emulator setups and such can be costly to reload/restart in terms of engineering time. We typically run 8meg of parity ram in clients, 16meg for ME's and chip designers where the applications are large and hairy. >The cnodes will mostly be 9000/340's with 8mb of memory and a 150mb HP7958B >disk on the HPIB interface. The cnodes will all be configured for local swap. Tasty! We run a mix of 320/350/360 clients. The 320's are slow, everything else seems more than ok. >Is this a reasonable number of cnodes per cluster? It'll work. Your expectations for disk performance may be much different from ours, depending largely on the relationship between time spent compiling and time spent sitting in an editor, or sitting in frame, or something else that isn't I/O intensive. We tend to limit ourselves to 16 seats per cluster, with *nothing* running on the server except Sendmail, etc. As long as the load stays below 1 on the server, all seems quite pleasant. You for sure should configure your lan with a bridge per cluster, the server and clients on their own thin strand... you should be ok. And if you're not, come back later and add another server or two, and move clients around. 120 clients on a single strand is a bad idea. >Has anyone experienced problems running out of process ids in a large cluster? Not the way you mean. We typically up the nproc and maxuprc (I think) params in the client kernels to allow more processes than the default, since we used to bang heads running X11 and lots of windows. The defaults may be more rational now, I don't know. The global process number space seems to be large enough, at least for our clusters. Never had a problem... >Does anyone have a workaround for the inability to put spooled devices, e.g., >printers, on cnodes? Sure. Use a named pipe. On the client, set up an inittab entry to cat stuff from the named pipe to the physical device, on the server tell the spooler to use the named pipe. This is explicitly not supported, but local experience is that it works ok... I forget who suggested this to me originally... It should also be possible to un-CDF /usr/lib/lpsched. Easiest would be to go to the server and cd to /usr/lib/lpsched+, then move remoteroot out of the way and link it to localroot, which would allow the scheduler to run on the clients as well. Naming all of the printers differently within a cluster should handle all of the possible conflicts... but I like the named pipe solution better because you aren't dorking with something an OS update will break, and there's only one copy of the scheduler to lose sleep over. Bdale