Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!amdcad!ames!pasteur!ucbvax!OFFICE-1.ARPA!WWB.TYM From: WWB.TYM@OFFICE-1.ARPA (Bill Barns) Newsgroups: comp.protocols.tcp-ip Subject: Re: ftp hang while trying to talk to cu20b.columbia.edu Message-ID: Date: 18 Feb 88 18:50:00 GMT References: <12375523073.148.SY.KEN@CU20B.COLUMBIA.EDU> Sender: usenet@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 64 Readers not interested in the guts of TCP implementation might as well skip this message. I've had to muck about with Tenex TCP which is "related" to TOPS-20 TCP, and has much worse constraints with buffer space due to being part of a single section monitor. Some of what I've done to try to cope with free storage problems may be relevant to your monitor, but only you can tell for sure. I think there must be a jillion locally-hacked subflavors of this TCP code, and who knows how much resemblance remains between yours and mine. I can say that I do have a copy of "DEC's source" as of about 2.5 years ago and it seems to have the same problems which I'm about to describe, so maybe you have them too. Refer to the TCP packetizer near PKTZ10 (in source file TCPPZ or TCPTCP). The call on TCPIPK will nonskip-return if you are indeed out of space. Code in a literal tries to queue you to retry but as I understand the code, there's a problem. Your TCB is not queued anywhere at this instant, but TSFP or TSEP is very probably on (else why are you here?) So ENCPKT and/or DLAYPZ will effectively no-op and you're out of the packetizer without being queued anywhere. Any future Force or Encourage will meet the same fate because of those same bits. You're trapped in the Twilight Zone. Cure: SETZRO ,(TCB) as the first thing in the literal that calls ENCPKT after TCPIPK failure. If this scenario happened, it would be likely to yield the symptoms described by David Herron; but so might other things. I made several changes to TCPPRC routine, a little too long to list here. Basically they are: not to run a free-storage scavenge more than once a second, so as not to hog the CPU; and don't run TVTOPR on any pass that did a scavenge, in hopes of making fewer and bigger Telnet packets. It's better to avoid running out of space in the first place, even if that takes something drastic. With an 1822 interface it's absolutely crucial not to let the input interrupt level run out of buffers, so as to avoid RFNM-related deadlocks. Solution: Never give Internet the "last" input buffer. Put it back on the input buffer list after processing the 1822 leader. I suspect this isn't your problem though, since your addresses are class B/C, thus probably not 1822. It would help to have some idea of what most of your space is being used for when you run out. Absent specific data, I'd suspect huge retransmit queues caused by big windows and slow gateways between you and the FTPers. You can brute-force cope with this somewhat either by clamping received windows, or by finagling the packetizer to refuse to packetize for any connection that has more than n packets on the RX queue, or where the first packet on the RX queue has actually been retransmitted (a quick test for congestion). This will slow things down, but that's what you need to do when you're short of space. You can condition this code on INTFSP being less than some threshold and shove it into the PKTZ10 area too. If you have a lot of TVT (Telnet) tinygram traffic, you might want to add code in this same area to ask TCPIPK for only the size of buffer you need, rather than a max size buffer, when space is below the threshold. Also in the OPSCAN routine (TTTVDV or TTANDV source file?) around OPSCA1+10 or so, just after the JUMPE T3,OPSCA7 you might add JN TSEP,(TCB),OPSCA7 which will prevent this routine from undoing any delay previously imposed by some other routine. Further down in this same routine you should also have a change published by Westfield and Crispin about 2-3 years ago which includes a test on whether the RX queue is empty. This change is mainly performance-oriented but will save free storage too in some situations. These cover the highlights of things I've done that seem relevant. You can talk bits with me further if you're interested, of course. I wanted to post this much in case it stirs up any comments from TOPS-20 hackers out there. Maybe someone out there has already done these changes in a form that will slide directly into CU20B monitor. -b