Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!spar!hunt From: hunt@spar.SPAR.SLB.COM (Neil Hunt) Newsgroups: comp.arch Subject: Self timed processors (was Re: Cycle stretching) Message-ID: <810@spar.SPAR.SLB.COM> Date: 19 Feb 88 01:58:20 GMT References: <844@daisy.UUCP> <20409@amdcad.AMD.COM> <1232@alliant.Alliant.COM> Reply-To: hunt@spar.UUCP (Neil Hunt) Organization: SPAR - Schlumberger Palo Alto Research Lines: 88 In article <1232@alliant.Alliant.COM> lackey@alliant.UUCP (Stan Lackey) writes: >Actually, I once heard a proposal to make a microprocessor totally >ansynchronous, with logic added to determine when each stage of logic was >complete, and use that to start the next stage. It would take advantage of >the fact that an ALU might be done sooner when adding small numbers, and lots >of times the numbers added are small (compared to the total size of the >data path). "Self-timed" is what it was called. >An interesting idea, but likely wouldn't work too well in a pipeline, and >would be difficult to interface to. -Stan I think that it would actually work rather well in a pipeline, with a little care. First, to recap on asynchonous signalling: an event is indicated by a signal transition on a wire (with either sign). In the simplest form of signaling, two wires are used for each bit of data. A transition on one wire indicates the transmission of a one bit, while the transition on the other wire indicates the transmission of a zero bit. Thus a single transition signals not only the arrival of an event, but also the type of event. The receiving unit signals back along a single wire that the data has been accepted, and more may be sent. To conserve wires, a data bundle is sometimes used. Here the bits of data are put on a bundle of wire in the conventional manner, using level signalling, and a single event wire transition signals the arrival of new stable data to the next stage. Again, an acknowledgement transition on a return wire is used. Each section of the pipeline has event connections to the unit preceeding and following which signal the availability and consumption of each data item. Consider a linear pipeline of processing elements. Data enters at one end, and propagates through the stages. Its speed of propagation is limited by the speed of the processing stages, and by the need to wait until the next stage is available. This means that the pipeline will run correctly at the speed of the slowest component; this would have been the clock frequency of a synchronous system. But if the slowest component is speeded up, perhaps by processing data which involves less propagation up the carry chain, the whole pipeline speeds up to take advantage of the smaller delays. The problem with pipelines running in a self timed fashion concerns external conditions. The obvious example is in a branch instruction; in a synchronous system, there are a known number of branch delay slots, which can be filled or empty, squashed, predicted, etc. The machine is designed to throw away the wasted cycles in an incorrectly predicted branch. But in a self timed system, it is not possible to say how many instructions could be in the pipeline when the branch takes the unpredicted direction. (A slower instruction could have entered the pipeline, and be lagging behind a fast branch instruction, or several fast instructions could all be bumper-to-bumper behind the branch instruction.) The answer is to make the relationships between the stages explicit, and represent them as additional signalling connections. For example, we could have some logic maintaining a state of the pipeline: either full and running, or flushing discarded instructions. When a taken branch is encountered, this is set to flushing mode. A signal which arrives with the new stream of instructions from the memory system resets this to the running state. The state of this unit controls whether the results of computations are written or discarded. In this way, regardless of the number of instructions actually in the pipeline when the branch was taken, the processor can start to execute the new stream as soon as it starts to arrive in the processor; there is no need to wait for the longest possible time which it might take for the pipeline to flush itself, as the entire processor is self timed. Appropriate use of FIFOs and signal acknowledgements takes care of the situation where the processor might have more than one taken branch in the pipeline at once, which might, without care, lead to the signal for the earlier branch being interpreted prematurely as the OK to start using instructions after the second branch. Concerning interfacing; many system busses are currently asynchronous, offering the same advantages of being able to use the speed of the cheaper operations, while not being limited to the slowness of the more expensive operations except when they are actually being performed. With a synchronous processor, some of this advantage is lost as the asynchronous delays on the bus must be quantised to clock cycles when interfacing to a synchronous processor. Would it not be better to have the entire system running in an asynchronous manner ? I think that this is in fact rather an exciting possibility. Neil/. hunt@spar.slb.com ...{amdahl|decwrl|hplabs}!spar!hunt