Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!PAN.SSEC.HONEYWELL.COM!thompson From: thompson@PAN.SSEC.HONEYWELL.COM (John Thompson) Newsgroups: comp.sys.apollo Subject: re: More problems with SR10.1 on 2nd disk Message-ID: <9006011608.AA11441@umix.cc.umich.edu> Date: 1 Jun 90 15:59:02 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 72 > Problem: > -------- > The DN4500 boots off the 2nd disk running SR10.1 (the 1st disk has SR10.2 > loaded on it). During the first couple of minutes, I can see all the Apollos > on the network by running 'lcnode'. I can also list/access all the files on > another Apollo ("kings") by doing 'ls //kings'. This seems to work only for > 2 or 3 tries. Around the 3rd or 4th try, everything freezes up (and > the machine becomes virtually unusable). After what seems an eternity > the 'ls //kings' command returns with output like: > > file XXXX not found > file YYYY not found > etc. etc. > > At this point doing an 'lcnode' also shows that, this node is not seeing > any other Apollo node on the network. Or for that matter trying to 'rlogin' > into this machine from any of our Suns (or vice-versa) doesn't work. > > Can somebody please point me as to what is wrong with this machine? It sounds to me like somebody on your ethernet / token-ring is running rtsvc with a non-zero network ID (note: NOT tcp network number). This is normally used when you set up a domain internet (aka transparent domain). It provides the Domain Distributed System (DDS) across a fast internet (Full T1 speed or greater is the 'supported' speed -- we've done it at < 56K baud, I believe). At any rate, if you do a "rtsvc" on your various nodes, you'll find something like . $ rtsvc . . Controller Net ID Service offered . ================== ======== ==================== . RING 28124 Own traffic only . ETH802.3_AT 0 Port not open . My node has 2 controllers (it used to be a DDS router node. The ring is the only one we use (for DDS) now. It's Network ID is 28124 (not really. For security concerns I changed it from what it REALLY is). If we had routing turned on, the service offered would be "Internet routing." I'd guess that at least one node on your net has the net ID set, and is broadcasting it. Domain nodes figure out what DDS net they're in by using the "hint_file" in `node_data when they boot up. If they hear somebody broadcasting a different network, they update themselves after a short time (15 minutes?), unless they are a router node (routing nodes are the only ones that broadcast net-numbers. If you have a couple nodes that are set up with routing enabled (even if they only have 1 controller), your nodes will eventually get confused if the net addresses conflict. You can fix this by correcting the nodes with conflicting net numbers. The rtsvc command is located in /com (Aegis) and /etc, (since you appear to be a Unix house). Now that I've spoken authoritatively on the subject, let me say that it doesn't necessarily explain everything. I would expect that you'd see at least ONE other node on the Apollo network (lcnode), because SOMEBODY else would have the same network ID. rlogin (and all TCP/IP services) should continue to work, unless you have file (/etc/hosts, for instance) linked over to a non- communicating node (e.g. //kings). Good luck! John Thompson Honeywell, SSEC Plymouth, MN 55441 thompson@pan.ssec.honeywell.com thompson@animal.ssec.honeywell.com Don't blame Honeywell for my opinions. Any address corruptions caused by the mailer should be send to /dev/mentor -- working with their mail system has ruined sendmail and my sanity. :-(