Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!usenet.ins.cwru.edu!cwjcc!ncoast!sgtech!adnan From: adnan@sgtech.UUCP (Adnan Yaqub) Newsgroups: comp.unix.xenix Subject: Re: Xenix TCP/IP Message-ID: Date: 16 Feb 90 09:09:45 GMT References: <1990Feb13.131255.3683@dlcq15.datlog.co.uk> Sender: adnan@sgtech.UUCP Distribution: comp Organization: Star Gate Technologies, Inc. Lines: 65 In-reply-to: cpm@dlcq15.datlog.co.uk's message of 13 Feb 90 13:12:55 GMT In article <1990Feb13.131255.3683@dlcq15.datlog.co.uk> cpm@dlcq15.datlog.co.uk (Paul Merriman) writes: Problem 1) --------- Occasionally we get a kernel panic as follows:- TRAP 0000000E in SYSTEM, error code 06000000 eax=FF030202 ebx=00000000 ecx=4A000001 edx=00000030 esi=0008A204 edi=4A000001 ebp=06000620 fl=00010282 udc=00030018 es=00000018 fs=0003003F gs=0000003F tr=00000100 pc=0090020:0001A12b ksp=060005B8 kernel: PANIC: non-recoverable kernel page fault I have seen this also. I assume you have tcp/ip 1.0.1d. We were trying to get things going over StarLAN and the WD driver was buggy. We contacted WD and got a new driver. We still get the panics, and sometimes a message which says: "qenable would have been called with NULL in wdsched() for XWAIT" and then a panic. What module (use nm) is at the pc above? SCO told us they have an even more recent WD driver than the one we got from WD. The said they just fixed a bug on Friday, February 9, 1990! Problem 2) ---------- This has been seen on the above Unisys machines with Western Digital network card and a Compaq with 3Com card. A number of processes which have socket connections to other machines break their connections. It should be mentioned here that these processes use non-blocking writes and an alarm call to determine when to "give up" on the write and break the connection. In one case you could not then connect to the machine across the network (telnet, rlogin), though the machine is running and can be accessed from the console. In some cases the connections have managed to re-establish themselves some time later. We have a similar problem where our main host on the network goes deaf (can send out packets but not receive them). It seems to be load related, i.e., it occurs when we have lots of activity into the machine (4 or more telnet sessions). I used the streams watch utility, sw, but couldn't see anything unusual. We have another problem here with SCO TCP/IP one host, the main one, spits out "Note: tcp sum: source sum " every now and again. I assume that these are warnings that a packet has been received with a TCP checksum error. The scary thing is that the network is very clean and the IP address of the source is sometimes the IP address of another Xenix box on the network. We have been told that the new TCP/IP code is in QA at SCO right now. Our plan of attack is to try and get a copy of the new (newer :-) WD driver and see if that helps things. We have not tried 3com boards. Maybe we should. Also, it was suggested that we try doing some telnets to ourselves (which uses the loopback driver) to see if the problem is driver related or socket related. (If it just weren't so intermittent...) I hope this rambling helps. You have my sympathy. -- Adnan Yaqub Star Gate Technologies, 29300 Aurora Rd, Solon, OH, 44139, USA, +1 216 349 1860 [...cwjcc!ncoast ...uunet!abvax ...ism780c ...sco ...mstar]!sgtech!adnan