Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!asuvax!ncar!elroy.jpl.nasa.gov!sdd.hp.com!zaphod.mps.ohio-state.edu!casbah.acns.nwu.edu!ucsd!hub.ucsb.edu!spectrum.CMC.COM!lars From: lars@spectrum.CMC.COM (Lars Poulsen) Newsgroups: comp.dcom.sys.cisco Subject: Re: Appliques can fail ..... Message-ID: <1991Mar8.174343.6135@spectrum.CMC.COM> Date: 8 Mar 91 17:43:43 GMT References: <32714@boulder.Colorado.EDU> Organization: Rockwell CMC Lines: 71 In article <32714@boulder.Colorado.EDU> Daniel.Karrenberg@cwi.nl (Daniel Karrenberg) writes a great and detailed "war story" about customer-debugging of a serial line problem involving a pair of cisco routers connected via V.35 modems. > .... >We were having some strange problems with that link. The keepalives were >getting thru OK and small IP packets were doing reasonably well (1-2% loss). >Large IP packets (1000B) weren't getting thru at all. > .... >We subsequently swapped a few apliques with the conclusion that >old ones (bar code serial <10000) consistently don't work on this line. >New ones do work although the link is not 100% stable yet but this might >be due to other problems. > >Lessons learned: > > 1) In some (rare) circumstances local loopbacks > do not detect local problems. Being originally (and now again) a software engineer, I spent a couple of years running a customer support organization for similar stuff. A possible source for the problem could be an engineering / design error in the V.35 applique[1]. I don't know if cisco had such an error, but several implementors have had the same problem. For some reason, designers of serial interfaces have a hard time keeping their plusses and minuses straight, especially on synchrounous interface clocks. Synchronous modem clocking is intended to be set up such that the data is sampled in the middle of the bit cell, where it is presumably most stable, and "ringing", "overshoot", "round shoulders" and other boundary effects at the edge of the bit cell have died down. If the clock is inverted, the data will instead be sampled near the edge of the bit cell. You would think that it would not work at all, but with some luck, it will actually work part of the time, but the link will be enormously sensitive to minor changes in cabling, grounding etc. Of course, loopbacks will work fine, since there will be symmetrical inversions on the send and receive side. Also, it will work fine in local "null modem" hookups in the lab. The V.35 interface only started to come into widespread use four years ago, and most manufacturers started to build them "from paper": Having only a spec to work from[2], and no compatible equipment to compare and test against. I know of several manufacturers that got several products out to the field with design problems, both on the DTE side and on the modem side. The embarrassment at making a design error that can be designed in terms this simple, has led to coverups that have greatly complicated the recovery process. Note, that having spare applique's would not have helped you, since they would have been of the same engineering revision as the original ones. Footnotes: [1] Why does cisco use the word "applique" instead of "adapter" ? I have seen many computer operators confused by the term. [2] The spec even has problems. There seem to be two different physical connectors allowed. I vaguely remember that they looked identical, but one used metric dimensions, the other inches ... [3] The above should not be construed as a putdown of cisco's engineering, for which I have the highest respect. [4] The most common cause of errors that get more frequent with increasing frame size, is misconfigured clocks in the telco domain (i.e. the two CSU/DSU's are not slaved to the same master clock). This can happen either by misconfiguring a modem (enabling one of them as a clock master when telco is providing clock) or by a mis-set switch in any telco MUX that the link passes through. When this happens, the clock phase is slowly drifting in and out of sync. Often, the slip will be less than one bit per million, causing you to have "a few bad minutes every two or three hours". -- / Lars Poulsen, SMTS Software Engineer CMC Rockwell lars@CMC.COM