Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!bbn!rochester!PT.CS.CMU.EDU!K.GP.CS.CMU.EDU!lindsay
From: lindsay@K.GP.CS.CMU.EDU (Donald Lindsay)
Newsgroups: comp.arch
Subject: Re: Horizontal pipelining
Message-ID: <380@PT.CS.CMU.EDU>
Date: Sat, 21-Nov-87 13:22:31 EST
Article-I.D.: PT.380
Posted: Sat Nov 21 13:22:31 1987
Date-Received: Mon, 23-Nov-87 04:26:05 EST
References: <201@PT.CS.CMU.EDU> <388@sdcjove.CAM.UNISYS.COM> <988@edge.UUCP> <393@sdcjove.CAM.UNISYS.COM> <3801@ptsfa.UUCP>
Sender: netnews@PT.CS.CMU.EDU
Organization: Carnegie-Mellon University, CS/RI
Lines: 34

This discussion needs a new title. It started with the Denelcor style of
shared-functional-unit multiprocessors: wandered all unbeknownst into
conventional multiprocessors (huh?!?): and now it's a history of IBM
nomenclature (pardon ????).

To get back to the original subject:

There are two reasons to share functional units.
 - cost, or, if you will, duty cycle.
 - simplicity ( in the sense of RISCness ).

The duty cycle argument says that if a unit is rarely used, then you get a
more effective design by sharing it among all the instruction-issue units.
Note that a lot of the average Cray sits idle while the rest is being
useful.  The counter-argument is that decreasing {prices, power consumption,
etc} make sharing less of a win. Plus, sharing puts constraints on packaging
- you have to get there from here.

The simplicity argument says that since successive clocks are on behalf of
different threads, therefore the pipelines need no interlocks. This should
lead to lean, mean pipelines with good clock rates. The problem is that to
do this, you would have to put interlocks on the pipe entrances, to resolve
the asynchonous demands for service. Denelcor solved that by sharing the
instruction-issue unit, using a queue. (When an answer came out, its thread
became elibible for another issue. ) The problem is that any single
sequential program is now unable to issue instructions at the full rate.
So, the machine is only a win for timesharing loads, or for multithread
applications. Obviously, you can do a sort of "fork" in a few clocks.
Denelcor argued that fine grained fork would pick up wins: and Alliant seems
to be getting mileage from such forking.

If you assume a single-chip CPU, I guess it's a bad idea.
-- 
	Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science