Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!tut.cis.ohio-state.edu!ucbvax!UCBCMSA.BITNET!CLIFF From: CLIFF@UCBCMSA.BITNET (Cliff Frost {415} 642-5360) Newsgroups: comp.sys.proteon Subject: Re: 4-into-6 coding, and the "clasic" pronet-80 problem Message-ID: <9003192208.AA25071@devvax.TN.CORNELL.EDU> Date: 19 Mar 90 22:08:00 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 125 Hi, We have had a fair amount of experience with this problem here at UC Berkeley, and I think we have essentially banished it in this form. With Proteon's help, you should be able to also. > Does anyone know what 33Hex maps into under the 4-into-6 bit coding > used by Proteon? etc... Hex 33 is useful because it maps to the ascii alpha character "3", so you can easily fill a file with the letter "3" (but don't put too many newlines or carriage returns in). Hex 33 maps into: 100011100011 which when followed by another hex 33 becomes a series of 3 zeros followed by 3 ones. I believe it is the lack of transitions over 3 bits that is hard for controllers that are drifting out of spec. There are several other data patterns that are at least as bad as this, hex: 36, 63, 66 BE, BB, EB, EE, and undoubtedly more. > Can anyone offer a technical explanation of the situation? Is it that > the "rf" (120mbps) stages become miss-tuned, ...etc? Well, I'm a software kind of guy, and our hardware techs sometimes use the phrase "programmer with a screwdriver" in a sarcastic way, so take what I say with some sized grain of salt. ;-) Each active device on the p80 ring reads the data that comes in using its own clock to decode it. If the data is for a node downstream, the device regenerates it, again using its own clock. This means that all the devices on the ring had better have clocks that are in close alignment with eachother. The clocks are all supposed to be at 120Mhz +/- a tiny fraction (10Khz?). These clocks tick totally independently of eachother, there is no "master" clock. This design appears (to me) to lead to some difficult debugging situations. You can have a ring that is working ok but has some clocks at the ragged edge, introduce a new node and all of a sudden your ring is shot. The new node may actually be OK, but you might "fix" the problems by putting in a different controller. Or you might "fix" the problems by plugging the controllers in in a different order. P3280s seem to have the worst problems. Maybe it's because they have two independent clocks, or maybe because they get too hot in their little boxes or maybe their circuitry is really different (big help, huh?). > Has anyone found a way to "help" the situation? >> What is the quick and effective way to find which p3280 or CTL >> card among the many on the ring already out of alignment? The only way I know how to deal with this requires real work, but it is what you have to do: 1) First you have to determine what order the nodes are in the ring. This is crucial because of the way the data is clocked and regenerated by each node. In order to pinpoint a problem node you have to know the exact path that data will take through your ring. To do this, you go and look at your wire center. Data will flow around it in a counter-clockwise direction. IMPORTANT: You have to realize that at the link level each packet is going to go all the way around the ring. Node A sends it to node B, and if all goes well node B sends it back with the ACK bit set. If all doesn't go well (either the ACK bit is off or the packet is trashed), node A will retransmit the packet (up to several times). You need to keep this in mind. This is the root mechanism that causes duplicate packets to show up. Also, if the path from B to A is bad, A will spend a certain amount of time retransmitting unnecessarily and this will slow down throughput from A to B--although not nearly as much as from B to A. 2) Next you have to have a way to test each node. Let's say you have p4200 routers which have a p80 interface and some others, say an ethernet. You need access to one of the ethernets from each router. What you do is ship data across the your ring. From point A to point B you ship (eg) a file with nothing but 3's in it. Then you ship the same size file with 1's in it (1's are inocuous). Then do the same tests from B to A. -If the 3's are causing problems, you will see very different throughput rates. -If there is only one broken node in the ring you will see that the throughput for the 3's file is dramatically worse in one direction than the other. -If there are several broken nodes in the ring you have a much more difficult hunt, but you can USUALLY get pretty far if not all the way. I've seen some strange things with this. Sometimes I've had to reorder things in the ring to find a bad component. 3) If you note any funnyness across your p3280 links get your p3280s upgraded to the latest revs. We have not had this problem with our p3280s since we did this. (We have had a couple of total failures, but that is at least pretty easily identifiable.) ===== I have some tools that can help. They are available for anonymous ftp from jade.berkeley.edu (128.32.136.9). 1) pub/ping.c and pub/ping.8: This lets you specify the data fill problem for the packets sent. This helps you spot the problem early. Since each ping packet goes in both directions it is no help in pinpointing the problem. 2) pub/netout.c: This sends data to the TCP discard port of a remote machine. You can specify the data fill pattern. This is easier to use for pinpointing things than ftp, since you don't need an account on the remote host. Unfortunately, not everybody has implemented the TCP discard port code. ===== We can identify when we are starting to have problems in a couple of ways. One is from SNMP collected output errors on the p80 ring interfaces. Another is looking at "T 2" in the router consoles and seeing lots of 8704 errors on the p80 interfaces. "Lots" is defined very fuzzily in my mind--it's based on experience... I don't mind discussing these problems with folks. I hope this is helpful to someone, my hands are tired. ;-) Cliff Frost (415) 642-5360 Central Computing Services University of California CLIFF AT UCBCMSA Berkeley, CA 94720