Path: utzoo!attcan!uunet!dino!ux1.cso.uiuc.edu!brutus.cs.uiuc.edu!apple!apple.com!pauls From: pauls@apple.com (Paul Sweazey) Newsgroups: comp.arch Subject: Re: Futurebus+ @ 500MBytes/sec Message-ID: <6149@internal.Apple.COM> Date: 12 Jan 90 16:22:14 GMT Sender: usenet@Apple.COM Organization: Apple Computer, Inc. Lines: 132 References:<7863@dime.cs.umass.edu> <280@leia.WV.TEK.COM> <136.filbo@gorn.santa-cruz.ca.us> <285@leia.WV.TEK.COM> FUTUREBUS EMBEDDED CLOCK: THE REAL SCOOP FROM A FALLEN ANGEL We each have different views of history, and I have tried to stay out of these discussions, but the Futurebus discussion has led to issues that I used to live and breathe for a living. In article <285@leia.WV.TEK.COM> johnt@opus.WV.TEK.COM (John Theus) writes: > However, we didn't get to > where we are today overnight, and in fact a little over a year ago we started > out with a separate clock signal when I wrote the first non-compelled > protocol proposal. The parallel protocol spec that I wrote, which was based directly on your first non-compelled proposal, is dated 7 July 88. > A proposal was made to have > a clock signal per transceiver (8 bits + parity + clock), which localizes > the skew to what can it done on a single chip. I believe that this was first seriously and publicly proposed by RV Balakrishnan and Dave Hawley during the summer of 1988. > The bit independent techniques evolved a little more slowly. The first > idea was to use an embedded clock such as one of the run length limited > encodings. This idea didn't last long when people started thinking about > building a phase locked loop per bit at several times the bit frequency. > Eventually, Emil Hahn of Signetics realized that you don't need a clock in > any form on the bus and he proposed the scheme that's in the Futurebus+ spec > and which I talked about in an earlier posting. RV Balakrishnan suggested embedded-clock synchronization as the ultimate solution to skew in February 1988. I devised and proposed embedded-clock synchronization to the SuperBus Study Group (now SCI) in March 1988, privately to Futurebus Committee members in May 1988, and at various times in Futurebus public forums through December 1988. Emil Hahn devised a feasible implementation of embedded-clock syncrhonization between November 1988 and January 1989. A HISTORY/PolySci LESSON: In the fall of 1987 the Futurebus (IEEE896.1-1987) was just being finished. I was serving as Coordinator of the Futurebus Cache Coherence Task Group. There was little active interest in speeding it up, but I could see that the real-world performance would not match the idealized theory or the marketing hype, so I started another IEEE project called the SuperBus Study Group. In February 1988, before SuperBus had become SCI and when it was still assumed to be a bus, I proposed the use of a synchronizer (clock) per transceiver to eliminate interdevice skew. RV Balakrishnan of National Semiconductor (Balu, the inventor of BTL logic) was in attendence, and he said (half in jest) that the only way to do better would be to encode a clock in every bit. Until that day, this alternative had only been mentioned, along with optical fibers and radiation baths, as an unrealistic solution for a parallel bus. Since the stated bandwidth goal of SuperBus was 1 gigabyte per second, I began to pursue embedded clocking seriously. (SuperBus is now IEEE P1596 Scalable Coherent Interconnect (SCI), chaired by Dave Gustavson of SLAC and co-chaired by Dave James of Apple. It is now a point-to-point interconnect of arbitrary topology, and it REALLY WILL reach 1 gigabyte per second.) On April 22 I published a memo inside National Semiconductor (I worked there at the time.) which I copied to some Futurebus committee members including the Futurebus committee chairman (also then a National employee). In it I described the theory, benefits, and implementation of embedded clock data transmission in an enhanced Futurebus. One week later I published an expanded report on the subject, entitled "NSC Multiprocessing Performance Roadmap". The report described stages of enhancements to Futurebus that would allow the real-world performance to achieve the marketing hype. In it I estimated that burst rates of 250 to 300 megabytes per second (32 bits wide) would be achievable with the first generation of embedded clock silicon. While the proposal was accepted as credible and viable within the NSC technical community, it was determined by the Futurebus Committee contingent at National to be heretical--"a threat to all that we have worked for"--because it implied that Futurebus-1987 could not reach those speeds without further enhancement (which, of course, was quite true). My proposal for embedded-clock transcievers involved the use of precision delay elements and quadrature sampling of each bit stream, which did not require PLL locking to the bit streams. By the Fall of 1988 I no longer held any committee office, and I was no longer directly involved in Futurebus product planning at work, leaving me free to concentrate on technical issues without regard to politics. I discussed technology freely, including embedded-clock data transfer with many, including Emil Hahn of Signetics. Meanwhile the US Navy began a process of adopting Futurebus, pushing the need for it to become real SOON, and for it to deliver all of its promises. In the December 1988 Futurebus meeting in San Diego, I gave a presentation offering two proposals: either (1) backward-compatible enhancements to Futurebus-1987 as Theus had proposed, or (2) more aggressive enhancements using either clock-per-chip or embedded-clock techniques. Because of new industry pressure that the Navy created, any changes had to be finalized within 8 weeks, so alternative (1) was chosen. Nevertheless, Hahn of Signetics and Balu of National agreed in that meeting to analyze both techniques and report back at a later meeting. At the Santa Clara meeting in January 1989 they came back with two different answers, and Signetics won, based on a similar but different (than my) data recovery method that Emil was confident he could implement. Signetics won. I was not involved in the decision making or analysis process; Two weeks after the San Diego meeting I went to work for Apple Computer. Emil's solution involves the use of dynamically settable delay elements, also uses no PLL locking to the bit streams, and may need as little as 1/4 of the FIFO storage of my proposed implementation. So why bring this all up now? I didn't get a patent for my embedded-clocking contributions, or a bonus check, or stock options, or a raise. So I'll settle for glory. Embedded clocking is debatably the breakthrough performance feature of the "last great backplane bus", and I would hope that the gang remembers that I helped get it started. To those of you with radical breakthrough ideas: be persistent but be very patient. To the receivers of those ideas: File, don't trash. There are gems among the gravel. Greeting to Theus, Balu, Hahn, Hawley, Gustavson, James, and the rest. They are the best in the bus business! Paul Sweazey Apple Computer, Inc. pauls@apple.com (408)-974-0253