Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!bbn!rochester!PT.CS.CMU.EDU!K.GP.CS.CMU.EDU!lindsay From: lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) Newsgroups: comp.arch Subject: Re: Horizontal pipelining Message-ID: <380@PT.CS.CMU.EDU> Date: Sat, 21-Nov-87 13:22:31 EST Article-I.D.: PT.380 Posted: Sat Nov 21 13:22:31 1987 Date-Received: Mon, 23-Nov-87 04:26:05 EST References: <201@PT.CS.CMU.EDU> <388@sdcjove.CAM.UNISYS.COM> <988@edge.UUCP> <393@sdcjove.CAM.UNISYS.COM> <3801@ptsfa.UUCP> Sender: netnews@PT.CS.CMU.EDU Organization: Carnegie-Mellon University, CS/RI Lines: 34 This discussion needs a new title. It started with the Denelcor style of shared-functional-unit multiprocessors: wandered all unbeknownst into conventional multiprocessors (huh?!?): and now it's a history of IBM nomenclature (pardon ????). To get back to the original subject: There are two reasons to share functional units. - cost, or, if you will, duty cycle. - simplicity ( in the sense of RISCness ). The duty cycle argument says that if a unit is rarely used, then you get a more effective design by sharing it among all the instruction-issue units. Note that a lot of the average Cray sits idle while the rest is being useful. The counter-argument is that decreasing {prices, power consumption, etc} make sharing less of a win. Plus, sharing puts constraints on packaging - you have to get there from here. The simplicity argument says that since successive clocks are on behalf of different threads, therefore the pipelines need no interlocks. This should lead to lean, mean pipelines with good clock rates. The problem is that to do this, you would have to put interlocks on the pipe entrances, to resolve the asynchonous demands for service. Denelcor solved that by sharing the instruction-issue unit, using a queue. (When an answer came out, its thread became elibible for another issue. ) The problem is that any single sequential program is now unable to issue instructions at the full rate. So, the machine is only a win for timesharing loads, or for multithread applications. Obviously, you can do a sort of "fork" in a few clocks. Denelcor argued that fine grained fork would pick up wins: and Alliant seems to be getting mileage from such forking. If you assume a single-chip CPU, I guess it's a bad idea. -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science