Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!rice!uw-beaver!zephyr.ens.tek.com!orca.wv.tek.com!leia!opus!johnt From: johnt@opus.WV.TEK.COM (John Theus;685-2564;61-183;625-6654;hammer) Newsgroups: comp.arch Subject: Re: Futurebus+ @ 500MBytes/sec Message-ID: <280@leia.WV.TEK.COM> Date: 3 Jan 90 23:14:58 GMT References: <276@leia.WV.TEK.COM> <33845@mips.mips.COM> <278@leia.WV.TEK.COM> <7863@dime.cs.umass.edu> Sender: johnt@leia.WV.TEK.COM Reply-To: johnt@opus.WV.TEK.COM (John Theus) Organization: Tektronix, Inc., Wilsonville, OR Lines: 81 In article <7863@dime.cs.umass.edu> yodaiken@freal.cs.umass.edu (victor yodaiken) writes: >In article <278@leia.WV.TEK.COM> johnt@opus.WV.TEK.COM (John Theus) writes: >>The high speed data transfer protocol Futurebus+ uses is called packet mode >>and it was invented by Emil Hahn of Signetics. This protocol uses source >>synchronous transmission without transmitting any clock. ... > >How exactly does this work? References? The only references are the Futurebus+ spec itself, and the published working group meeting minutes where Emil presented the papers on his protocol. The packet data transport protocol was designed to move data as fast as possible with a minimum feature set. The protocol does not allow sub-word operations, only 32, 64, 128 or 256 bit wide words can be transferred. No lock operations can be done when using this protocol. Blocks are transferred of length 2, 4, 8, 16, 32 or 64 words long. The block length is signalled at the start of the transfer. The transfer protocol is very similar to the asynchronous protocol used on RS-232. If we just think about an individual bit for now, the sender transmits its data at the frequency of an on-board clock. As with RS-232, the frequency must be known by both the transmitter and the receiver in advance. The Futurebus+ protocols provide a mechanism for selecting one of two such frequencies on a transaction by transaction basis. To start data transmission, the sender transmits a sync bit which is a logic one. The data is encoded using NRZI, where a logic one is represented by an edge transition during a datum cell, and a logic zero is represented by no transition. Therefore to start a packet, an edge is sent followed by the encoded data, and concluded by an even longitudinal parity bit. When parity is correct, the signal line is left in the logic zero state. The receiver has its own on-board clock that runs at the same frequency as the sender. Both sender and receiver must have clock frequency tolerances of 0.01% or better. When the receiver sees the sync bit at the start of a packet, its logic sets a precision delay equal to the phase difference between the sync bit and its on-board clock. Thereafter, the logic uses the on-board clock plus the delay to define the datum cell positions for sampling the rest of the data. The maximum packet length is limited by the drift that occurs between the 2 clock sources. Now multiply the sending and receiving circuitry by the number of bits in a parallel word. Note that there is only 1 on-board clock source, but N (where N equals the number of bits/word) independently settable delays in the receiver. After the individual bits are captured in the receiver, additional stages of logic are used to synchronize the bits into a parallel word. Clearly this is not a protocol to implement in discrete logic, and silicon companies are hard at work building the parts necessary to run this protocol. The Futurebus+ spec requires a minimum clock frequency of 60 MHz, which translates to 60 Mtransfers/sec. We expect the first silicon to do better than this. The bandwidth utilization efficiency of this protocol varies greatly based on the packet length, from 50% for a 2 word packet to 97% for a 64 word packet. It is possible to sustain the 97% efficiency over transfers that are much longer than 64 words by using multiple packet mode. This protocol allows packets to be chained together back-to-back with no lost clocks; as long as a single source is transmitting all the packets. While a packet is being transmitted, the command, status and compelled handshake signals are used to request new packets and acknowledge new packets, including their cache attributes. The requesting process can occur asynchronously with respect to the packet currently being transmitted and also out of phase. By this I mean requests can be either in lock step with their packet transfer, or 1 or more packets ahead. Cache coherence is maintained during multiple packet mode and intervention is also supported. During a single transaction there can be multiple packet sources due to intervention. When a packet source change is made, at least 1 clock is lost in the change-over. A good example of multiple packet sources during a single transaction would be flushing a dirty page back to a disk subsystem that has dirty lines in several different caches. This protocol allows a single transaction to remove the page from memory and the caches, and invalidate the caches. John Theus johnt@opus.wv.tek.com Futurebus+ Parallel Protocol Coordinator Tektronix, Inc. Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations