Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!rice!uw-beaver!zephyr.ens.tek.com!orca.wv.tek.com!leia!opus!johnt From: johnt@opus.WV.TEK.COM (John Theus) Newsgroups: comp.arch Subject: Re: Futurebus+ @ 500MBytes/sec Keywords: Futurebus+ Message-ID: <285@leia.WV.TEK.COM> Date: 11 Jan 90 02:35:43 GMT References: <7863@dime.cs.umass.edu> <280@leia.WV.TEK.COM> <136.filbo@gorn.santa-cruz.ca.us> Sender: johnt@leia.WV.TEK.COM Reply-To: johnt@opus.WV.TEK.COM (John Theus) Organization: Tektronix, Inc., Wilsonville, OR Lines: 92 In article <136.filbo@gorn.santa-cruz.ca.us> filbo@gorn.santa-cruz.ca.us (Bela Lubkin) writes: >In article <280@leia.WV.TEK.COM> John Theus writes: >>The receiver has its own on-board clock that runs at the same frequency as >>the sender. Both sender and receiver must have clock frequency tolerances >>of 0.01% or better. When the receiver sees the sync bit at the start of a >>packet, its logic sets a precision delay equal to the phase difference >>between the sync bit and its on-board clock. Thereafter, the logic uses >>the on-board clock plus the delay to define the datum cell positions for >>sampling the rest of the data. The maximum packet length is limited by >>the drift that occurs between the 2 clock sources. > >Why isn't one more line used to transmit the sender's idea of the data >clock? >[...] > There are at least 2 major reasons way we don't ship a clock signal with the data. One is a fundamental performance limiter, while the other is related to the data encoding scheme we use. However, we didn't get to where we are today overnight, and in fact a little over a year ago we started out with a separate clock signal when I wrote the first non-compelled protocol proposal. What we've learned from evaluating transfer protocols is that the fundamental performance limiter is caused by signal skew (assuming a clean electrical environment). Skew is the difference in time between the arrival of two signals from a common source. The major sources of skew are variations in the propagation delay through logic and though the physical environment. In the Futurebus+ environment, it takes several bus transceiver chips to make a 32 bit wide data path. The limiting factor here is power dissipation. 9 bits is near the limit for present BTL transceivers with normal commercial cooling practices. The skew through these chips is their spec'd maximum propagation delay minus their minimum propagation delay. The best BTL transceivers available today have a skew of 5 nsec. So just accounting for getting on and off the bus introduces 10 nsec of skew, which is all lost time. In addition, the bus itself introduces skew due mainly to differences in capacitive loading on each line. After including the skews from all the other parts in the logic path, you're left with pretty poor performance. Also notice that there is no difference here based on signal type. The skews exists for both clock to data and data to data. We identified 2 classes of skew elimination techniques, which I'll call chip localized and bit independent. The chip localized technique takes advantage of the fact that you can hold skews to a much smaller value on a single chip than across multiple chips. A proposal was made to have a clock signal per transceiver (8 bits + parity + clock), which localizes the skew to what can it done on a single chip. Numbers in the range of 1 nsec. of skew were believed possible. This technique was eventually discarded primarily due to its physical overhead. Although the silicon was very simple for this technique, the cost in power, pins and real estate was judged too high. We agreed that complex silicon was better than a more complex physical environment. Farther down the list was that this technique did not account for bus skew. The bit independent techniques evolved a little more slowly. The first idea was to use an embedded clock such as one of the run length limited encodings. This idea didn't last long when people started thinking about building a phase locked loop per bit at several times the bit frequency. Eventually, Emil Hahn of Signetics realized that you don't need a clock in any form on the bus and he proposed the scheme that's in the Futurebus+ spec and which I talked about in an earlier posting. The other point I want to make about transmitting the clock concerns the required bandwidth and signal fidelity. When I previously talked about our minimum required clock rate of 60 MHz, that's the rate at which data is clocked onto the bus. The bandwidth of the data itself of one-half this frequency. I also previously stated that the limit for our packet protocol is the electrical environment, and somewhere below 10 nsec per word things start to fall apart. Putting these 2 bits of information together says you don't ship a single edge clock with the data or you have to half your data bandwidth due to the electrical limitations. As your example showed, you can use a two edge clock, which we do for our slower compelled protocol. However, at high speeds the variation in a signals propagation delay between its zero and one levels becomes very significant. This skew within the clock signal, or more precisely its duty cycle precision becomes a limiting factor. The precision required by the Futurebus+ packet protocol prevents the use of a 2 edge clock. There are several approaches to solving this including differential and 2 half frequency 180 degrees out of phase clocks, but each has its own set of problems. One final point, a 0.01% clock oscillator is a industry standard tolerance, and its not a big deal. John Theus johnt@opus.wv.tek.com Futurebus+ Parallel Protocol Coordinator Tektronix, Inc. Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations