Path: utzoo!attcan!uunet!husc6!bloom-beacon!mit-vax!spectre From: spectre@mit-vax.LCS.MIT.EDU (Joseph D. Morrison) Newsgroups: comp.arch Subject: Re: VLIW Message-ID: <5097@mit-vax.LCS.MIT.EDU> Date: 15 Nov 88 22:52:36 GMT References: <70@armada.UUCP> <28200228@urbsdc> <5087@mit-vax.LCS.MIT.EDU> <556@m3.mfci.UUCP> Reply-To: spectre@mit-vax.UUCP (Joseph D. Morrison) Organization: MIT Laboratory for Computer Science, Cambridge Lines: 139 In article <556@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >In article <5087@mit-vax.LCS.MIT.EDU> spectre@mit-vax.UUCP (Joseph D. Morrison) writes: >>Scoreboarding is but one of several techniques for managing a >>pipeline. (Some alternative techniques are micro-dataflow, simple > ^^^^^^^^^^^^^^ >Would you elaborate a little on that? Never heard of it. Micro-dataflow is an interesting pipeline management mechanism that was used in the IBM 360/91 computer. (I think IBM used it in one other machine as well, but I can't remember which.) The idea was that instructions could be started out of order, and could finish out of order, with a "reservation station" making sure that no dependencies were violated. If you are interested, here are the details! (If not, hit 'n' now!) What are reservation stations? ============================== The CPU has several of these "reservation stations" in its hardware. Each reservation station is either: (a) empty, or (b) contains an instruction to be performed as soon as possible, in the form: +----+------------------+------------------+------------+ | OP | TAG/DATA (src 1) | TAG/DATA (src 2) | TAG (dest) | +----+------------------+------------------+------------+ i.e. an operation, two source operands (each of which can be a valid datum or a tag), and one destination operand which is always a tag. The Instruction-Processing Algorithm ==================================== You can think of this as three processes running "in parallel" on the hardware. (Assume, for simplicity, that all instructions are in the form (OP SRC1 SRC2 DEST). Assume also that all sources and destinations are registers.) PROCESS 1 ========= PROCESS-1 continually loads the reservation stations. Every time a reservation station becomes free, PROCESS-1 loads the next instruction from memory into the free reservation station. To load an instruction into a reservation station: - Say the instruction was (+ R0 R0 R1) - First, generate a new tag for the solution (which will go in R1). (Say we generate the tag "q".) - Stick the tag "q" in register R1. - Next, check if R0 contains data or a tag. If it contains data (say the number 53) put the following in the reservation station: (+ 53 53 q) If R0 contained tag "p", we would put: (+ p p q) into the reservation station. - IN GENERAL: - A tag is always generated for the destination of each instruction. - This tag is always placed in BOTH the destination register, and in the "destination" position of the reservation station. - The operands are copied right into the reservation station if they are available, but if a tag is in an operand register, that tag is placed in the reservation station instead. - Informally, tags represent "data in transit". PROCESS 2 ========= PROCESS-2 dispatches instructions from the reservation station. PROCESS-2 is continually scanning the reservation stations, checking for stations in which both source operands contain valid data. If one is found, the operation is "shoved into" the pipelined ALU, along with the tag for the result. For example, if a station contained (+ 53 53 q), the dispatcher would shove [+ 53 53, q] into the ALU. (The ALU goes off and does its thing, and when it's done, the pair [106, q] will appear on the bus.) If PROCESS-2 finds something like (+ p p q) in a reservation station, it just ignores that station for the time being. Every time an instruction is dispatched from a reservation station, that station becomes empty and can be refilled by PROCESS-1. PROCESS 3 ========= This process "watches" the bus where results come out, looking for [result, tag] pairs. Whenever it sees such a pair, PROCESS-3 finds all places (in the register file and in the reservation stations) with that tag, and shoves the result into all those places. Discussion ========== Notice that this is a "micro" version of a dataflow machine. Micro-dataflow gives you as much parallelism as is permitted by any reordering of the instructions, while properly handling any data dependencies, BUT: degenerates to a normal interlocked pipeline if (a) the reservation stations become full, or (b) it runs out of tags Write/write conflicts are elegantly handled, as in an ordinary dataflow machine; if two instructions both write to R0, the result slot of each instruction is still allocated a different tag, and the last instruction is the one whose tag ends up in R0. Thus "the right thing will happen". It's really a very clever scheme! The only bug? It's not worth the hardware cost. You need associative lookup to handle copy-back of results into the registers and reservation stations, and that takes chip area. You need a tag generator, and some extra datapaths... And unless you use a large reservation station, you really don't find that much extra parallelism. Details on the 360/91 system can be found in the following article: Anderson, Sparacio and Tomasulo; "The IBM System/360 Model 91: Machine Philosophy and Instruction-Handling", IBM Journal, January 1967. (pp.8-24) Joe Morrison -- MIT Laboratory for Computer Science UUCP: ...!mit-eddie!vx!spectre 545 Technology Square, Room 425 ARPA: spectre@vx.lcs.mit.edu Cambridge, Massachusetts, 02139 (617) 253-5881 -- "Back off, man; I'm a scientist!"