Path: utzoo!attcan!uunet!husc6!bloom-beacon!mit-vax!spectre
From: spectre@mit-vax.LCS.MIT.EDU (Joseph D. Morrison)
Newsgroups: comp.arch
Subject: Re: VLIW
Message-ID: <5097@mit-vax.LCS.MIT.EDU>
Date: 15 Nov 88 22:52:36 GMT
References: <70@armada.UUCP> <28200228@urbsdc> <5087@mit-vax.LCS.MIT.EDU> <556@m3.mfci.UUCP>
Reply-To: spectre@mit-vax.UUCP (Joseph D. Morrison)
Organization: MIT Laboratory for Computer Science, Cambridge
Lines: 139

In article <556@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes:
>In article <5087@mit-vax.LCS.MIT.EDU> spectre@mit-vax.UUCP (Joseph D. Morrison) writes:
>>Scoreboarding is but one of several techniques for managing a
>>pipeline.  (Some alternative techniques are micro-dataflow, simple
>                                             ^^^^^^^^^^^^^^
>Would you elaborate a little on that?  Never heard of it.

Micro-dataflow is an interesting pipeline management mechanism that
was used in the IBM 360/91 computer. (I think IBM used it in one other
machine as well, but I can't remember which.)

The idea was that instructions could be started out of order, and could
finish out of order, with a "reservation station" making sure that no
dependencies were violated.

If you are interested, here are the details! (If not, hit 'n' now!)

What are reservation stations?
==============================
The CPU has several of these "reservation stations" in its hardware.
Each reservation station is either:

	(a) empty, or
	(b) contains an instruction to be performed as soon as
	    possible, in the form:

        +----+------------------+------------------+------------+
        | OP | TAG/DATA (src 1) | TAG/DATA (src 2) | TAG (dest) |
        +----+------------------+------------------+------------+

	i.e. an operation, two source operands (each of which can be
	     a valid datum or a tag), and one destination operand which
	     is always a tag.

The Instruction-Processing Algorithm
====================================
You can think of this as three processes running "in parallel" on the
hardware.  (Assume, for simplicity, that all instructions are in the
form (OP SRC1 SRC2 DEST).  Assume also that all sources and destinations
are registers.)

PROCESS 1
=========
PROCESS-1 continually loads the reservation stations.  Every time a
reservation station becomes free, PROCESS-1 loads the next instruction
from memory into the free reservation station.

To load an instruction into a reservation station:

	- Say the instruction was (+ R0 R0 R1)

	- First, generate a new tag for the solution (which will go in
	  R1). (Say we generate the tag "q".)

	- Stick the tag "q" in register R1.

	- Next, check if R0 contains data or a tag. If it contains data
	  (say the number 53) put the following in the reservation
	  station:
		(+  53 53 q)
	  If R0 contained tag "p", we would put:
		(+  p p q)
	  into the reservation station.

	- IN GENERAL:
		- A tag is always generated for the destination of each
		  instruction.

		- This tag is always placed in BOTH the destination
		  register, and in the "destination" position of the
		  reservation station.

		- The operands are copied right into the reservation
		  station if they are available, but if a tag is in an
		  operand register, that tag is placed in the
		  reservation station instead.

		- Informally, tags represent "data in transit".
PROCESS 2
=========
PROCESS-2 dispatches instructions from the reservation station.
PROCESS-2 is continually scanning the reservation stations, checking for
stations in which both source operands contain valid data.  If one is
found, the operation is "shoved into" the pipelined ALU, along with the
tag for the result.

For example, if a station contained (+ 53 53 q), the dispatcher would
shove [+ 53 53, q] into the ALU.  (The ALU goes off and does its thing,
and when it's done, the pair [106, q] will appear on the bus.)

If PROCESS-2 finds something like (+ p p q) in a reservation station,
it just ignores that station for the time being.

Every time an instruction is dispatched from a reservation station, that
station becomes empty and can be refilled by PROCESS-1.

PROCESS 3
=========
This process "watches" the bus where results come out, looking for
[result, tag] pairs.  Whenever it sees such a pair, PROCESS-3 finds all
places (in the register file and in the reservation stations) with that
tag, and shoves the result into all those places.

Discussion
==========
Notice that this is a "micro" version of a dataflow machine.

Micro-dataflow gives you as much parallelism as is permitted by any
reordering of the instructions, while properly handling any data
dependencies, BUT: degenerates to a normal interlocked pipeline if

	(a) the reservation stations become full, or
	(b) it runs out of tags

Write/write conflicts are elegantly handled, as in an ordinary dataflow
machine; if two instructions both write to R0, the result slot of each
instruction is still allocated a different tag, and the last instruction
is the one whose tag ends up in R0.  Thus "the right thing will happen".

It's really a very clever scheme!  The only bug?  It's not worth the
hardware cost.  You need associative lookup to handle copy-back of
results into the registers and reservation stations, and that takes chip
area.  You need a tag generator, and some extra datapaths...  And unless
you use a large reservation station, you really don't find that much
extra parallelism.

Details on the 360/91 system can be found in the following article:

Anderson, Sparacio and Tomasulo; "The IBM System/360 Model 91: Machine
Philosophy and Instruction-Handling", IBM Journal, January 1967.
(pp.8-24)

        Joe Morrison
--
MIT Laboratory for Computer Science     UUCP: ...!mit-eddie!vx!spectre
545 Technology Square, Room 425         ARPA: spectre@vx.lcs.mit.edu
Cambridge, Massachusetts, 02139         (617) 253-5881
--
"Back off, man; I'm a scientist!"