Path: utzoo!attcan!uunet!aplcen!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!usc!ucsd!ucbvax!PAN.SSEC.HONEYWELL.COM!thompson From: thompson@PAN.SSEC.HONEYWELL.COM (John Thompson) Newsgroups: comp.sys.apollo Subject: re: re: More problems with SR10.1 on 2nd disk Message-ID: <9006020144.AA02755@umix.cc.umich.edu> Date: 2 Jun 90 01:24:27 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 71 Netlanders -- Sorry about the partial mail message. I accidentally used "." as an indent character, and ended up with "." at the start of a line all by itself. Here's the message I _meant_ to send out: > Problem: > -------- > The DN4500 boots off the 2nd disk running SR10.1 (the 1st disk has SR10.2 > loaded on it). During the first couple of minutes, I can see all the Apollos > on the network by running 'lcnode'. I can also list/access all the files on > another Apollo ("kings") by doing 'ls //kings'. This seems to work only for > 2 or 3 tries. Around the 3rd or 4th try, everything freezes up (and > the machine becomes virtually unusable). After what seems an eternity > the 'ls //kings' command returns with output like: > > file XXXX not found > file YYYY not found > etc. etc. > > At this point doing an 'lcnode' also shows that, this node is not seeing > any other Apollo node on the network. Or for that matter trying to 'rlogin' > into this machine from any of our Suns (or vice-versa) doesn't work. > > Can somebody please point me as to what is wrong with this machine? It sounds to me like somebody on your ethernet / token-ring is running rtsvc with a non-zero network ID (note: NOT tcp network number). This is normally used when you set up a domain internet (aka transparent domain). It provides the Domain Distributed System (DDS) across a fast internet (Full T1 speed or greater is the 'supported' speed -- we've done it at < 56K baud, I believe). At any rate, if you do a "rtsvc" on your various nodes, you'll find something like : $ rtsvc : : Controller Net ID Service offered : ================== ======== ==================== : RING 28124 Own traffic only : ETH802.3_AT 0 Port not open : Note that our token ring is network 28124 (not REALLY -- for security reasons I don't broadcast the real DDS network). We used to have the ethernet running DDS services too, but no longer need it (a sister division closed). If routing is enabled, you'll see a service of "Internet router". Any node with that service will broadcast the network that it believes is true. Any node that doesn't offer routing will listen for those packets, and after a short time (15 minutes?), will modify its own network if necessary. The initial network that a node uses is stored (non-readable) in `node_data/hint_file, if I remember right. It sounds to me like you have 2 or more nodes broadcasting different net numbers. If the DDS net doesn't match, nodes won't talk to each other except through a router node (2 controllers, each offering routing). If your /etc/hosts table (or other tcp info) is linked from your DN4500 off to another node, you won't be able to locate it, and rlogin will fail. If this is the correct cause of the problem, just check all the nodes that are physically connected to your ethernet / token-ring, ('rtsvc' is in /com and /etc), and you'll find several conflicting networks. Correct them by using 'rtsvc -dev -net ', and you should be ok. It might be necessary to disconnect some nodes from the network before doing this, as they may become so brain-dead that they can't cope with the rest of the nodes. Good Luck! John Thompson Honeywell, SSEC thompson@pan.ssec.honeywell.com thompson@animal.ssec.honeywell.com thompson%pan.ssec.honeywell.com@cim-vax.honeywell.com Don't blame anyone but me for my opinions. (Well, maybe my parents are responsible).