Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!rice!uw-beaver!zephyr.ens.tek.com!orca.wv.tek.com!leia!opus!johnt
From: johnt@opus.WV.TEK.COM (John Theus)
Newsgroups: comp.arch
Subject: Re: Futurebus+ @ 500MBytes/sec
Keywords: Futurebus+
Message-ID: <285@leia.WV.TEK.COM>
Date: 11 Jan 90 02:35:43 GMT
References: <7863@dime.cs.umass.edu> <280@leia.WV.TEK.COM> <136.filbo@gorn.santa-cruz.ca.us>
Sender: johnt@leia.WV.TEK.COM
Reply-To: johnt@opus.WV.TEK.COM (John Theus)
Organization: Tektronix, Inc., Wilsonville, OR
Lines: 92

In article <136.filbo@gorn.santa-cruz.ca.us> filbo@gorn.santa-cruz.ca.us (Bela Lubkin) writes:
>In article <280@leia.WV.TEK.COM> John Theus writes:
>>The receiver has its own on-board clock that runs at the same frequency as
>>the sender.  Both sender and receiver must have clock frequency tolerances
>>of 0.01% or better.  When the receiver sees the sync bit at the start of a
>>packet, its logic sets a precision delay equal to the phase difference
>>between the sync bit and its on-board clock.  Thereafter, the logic uses
>>the on-board clock plus the delay to define the datum cell positions for
>>sampling the rest of the data.  The maximum packet length is limited by
>>the drift that occurs between the 2 clock sources.
>
>Why isn't one more line used to transmit the sender's idea of the data
>clock?
>[...]
>

There are at least 2 major reasons way we don't ship a clock signal with
the data.  One is a fundamental performance limiter, while the other is
related to the data encoding scheme we use.  However, we didn't get to
where we are today overnight, and in fact a little over a year ago we started
out with a separate clock signal when I wrote the first non-compelled
protocol proposal.

What we've learned from evaluating transfer protocols is that the fundamental
performance limiter is caused by signal skew (assuming a clean electrical
environment).  Skew is the difference in time between the arrival of two
signals from a common source.  The major sources of skew are variations in
the propagation delay through logic and though the physical environment.

In the Futurebus+ environment, it takes several bus transceiver chips to
make a 32 bit wide data path.  The limiting factor here is power
dissipation.  9 bits is near the limit for present BTL transceivers with
normal commercial cooling practices.  The skew through these chips is their
spec'd maximum propagation delay minus their minimum propagation delay.
The best BTL transceivers available today have a skew of 5 nsec.  So just
accounting for getting on and off the bus introduces 10 nsec of skew,
which is all lost time.  In addition, the bus itself introduces skew due
mainly to differences in capacitive loading on each line.  After including
the skews from all the other parts in the logic path, you're left with pretty
poor performance.  Also notice that there is no difference here based on
signal type.  The skews exists for both clock to data and data to data.

We identified 2 classes of skew elimination techniques, which I'll call
chip localized and bit independent.  The chip localized technique takes
advantage of the fact that you can hold skews to a much smaller value on
a single chip than across multiple chips.  A proposal was made to have
a clock signal per transceiver (8 bits + parity + clock), which localizes
the skew to what can it done on a single chip.  Numbers in the range of
1 nsec. of skew were believed possible.

This technique was eventually discarded primarily due to its physical
overhead.  Although the silicon was very simple for this technique, the
cost in power, pins and real estate was judged too high. We agreed that
complex silicon was better than a more complex physical environment.
Farther down the list was that this technique did not account for bus
skew.

The bit independent techniques evolved a little more slowly.  The first
idea was to use an embedded clock such as one of the run length limited
encodings.  This idea didn't last long when people started thinking about
building a phase locked loop per bit at several times the bit frequency.
Eventually, Emil Hahn of Signetics realized that you don't need a clock in
any form on the bus and he proposed the scheme that's in the Futurebus+ spec
and which I talked about in an earlier posting.

The other point I want to make about transmitting the clock concerns the
required bandwidth and signal fidelity.  When I previously talked about
our minimum required clock rate of 60 MHz, that's the rate at which data
is clocked onto the bus.  The bandwidth of the data itself of one-half
this frequency.  I also previously stated that the limit for our packet
protocol is the electrical environment, and somewhere below 10 nsec per
word things start to fall apart.  Putting these 2 bits of information
together says you don't ship a single edge clock with the data or you have
to half your data bandwidth due to the electrical limitations.

As your example showed, you can use a two edge clock, which we do for our
slower compelled protocol.  However, at high speeds the variation in a
signals propagation delay between its zero and one levels becomes very
significant.  This skew within the clock signal, or more precisely its
duty cycle precision becomes a limiting factor.  The precision required by
the Futurebus+ packet protocol prevents the use of a 2 edge clock.  There
are several approaches to solving this including differential and 2 half
frequency 180 degrees out of phase clocks, but each has its own set of
problems.

One final point, a 0.01% clock oscillator is a industry standard
tolerance, and its not a big deal.

John Theus                                johnt@opus.wv.tek.com
Futurebus+ Parallel Protocol Coordinator
Tektronix, Inc.
Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations