Path: utzoo!attcan!uunet!clyde.concordia.ca!mcgill-vision!bloom-beacon!snorkelwacker!tut.cis.ohio-state.edu!ucbvax!hplabs!hpfcso!hpfcdc!daveg From: daveg@hpfcdc.HP.COM (Dave Gutierrez) Newsgroups: comp.sys.hp Subject: Re: Booting discless over multiple LANs Message-ID: <5570377@hpfcdc.HP.COM> Date: 7 Feb 90 20:27:29 GMT References: <3249@plains.UUCP> Organization: HP Ft. Collins, Co. Lines: 104 > >> The reason for implementing >> the boot procedure in this way is to make the system be able to >> recover from network errors - a diskless system is able to check the >> LAN if it has a failure. If it detects that the LAN is broken then >> the client will wait indefinitely for the LAN to be fixed and the >> server to respond. Actually, the booting procedures have nothing to do with LAN Break detection. Network errors fall into a class all their own, that is not really applicable to the topic. > >Why can't a discless node realize this after it is running? If you >unplug a discless node from the network (accidentally one hopes :-) >it almost immediately panics. Couldn't there be a way to make it >wait for a few minutes before panicking? > >********************************************************************* >Tony Burzio * Don't touch that ..FRZZZZZT.. cord! >Martin Marietta Labs * Sigh... >mmlai!burzio@uunet.uu.net * >********************************************************************* I guess I will try to provide a high=level description of how the lan-break detection works. The diskless HP-UX protocol in conjunction with the recovery and selftest code is capable of frequently surviving a broken or unterminated LAN cable [1]. However, there are LAN cable topological configurations that must be considered prior to configuring a diskless cluster. At all times the integrity and survivability of the diskless cluster should be maintained. The diskless LAN break detection and recovery code will not detect a broken or unterminated LAN if the diskless cnodes and their respective root server cnode are on opposite sides of either a LAN bridge box, LAN repeater, or any other device that acts as a terminator (Fig 1). In addition, if the MAU or AUI cable is disconnected from a diskless cnode, the rootserver, after certain selftest periods, will probably declare the cnode dead. This situation is only detectable on the diskless cnode in question; the rest of the backbone is still functioning correctly. The converse situation is where the root server's MAU or AUI cable is disconnected. This will most likely result in the diskless cnodes losing contact with the root server. If the LAN cable is broken or unterminated the following messages may be received. These messages are considered recoverable temporary failures. o Suspected backbone cable not properly terminated. o Suspected backbone cable not properly terminated or MAU disconnect. o Suspected AUI cable disconnected from MAU or grounded backbone cable. The following messages may be received and are non-recoverable failures. o Panic(Diskless: LAN Failure, Unknown Cause) o Panic(FATAL ERROR: DISKLESS LAN FAILURE: Card State = X) (where X is a number that is interpreted for you as one of the following panics). Panics: Panic(Diskless: LAN Interface Card Failure) Panic(Diskless: LAN Link Failure) Panic(Diskless: LAN Hardware Failure) Panic(Diskless: Lan Failure, Invalid Card State) --------- --------- --------- | W | | W | | W | | | | | | | --------- --------- --------- | | | o---------------------------------------------------o LAN Segment A | ------------- | Bridge | ----- or ---- | Reapeater | ------------- | o---------------------------------------------------o LAN Segment B | | | --------- --------- --------- | W | | W | | S | | | | | | | --------- --------- --------- If segment B were broken or unterminated, the diskless cnodes (W) on segment A would lose contact with their root server (S). The diskless cnodes and root server on segment B would recognize the problem and continue local processing (if possible) and wait until the LAN in repaired. There may be a slight delay after the LAN is repaired for the recovery code to declare the LAN as being UP and for transmissions/receptions to continue. Fig. 1 Footnotes -------- [1] At no time is it recommended that backbone cable reconfigurations be done on an active diskless cluster. Good practice dictates that the entire cluster be shut down prior to performing any sort of backbone cable maintenance.