Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!prls!mips!mash
From: mash@mips.UUCP (John Mashey)
Newsgroups: comp.arch
Subject: Re: Horizontal pipelining
Message-ID: <958@winchester.UUCP>
Date: Sun, 22-Nov-87 04:34:45 EST
Article-I.D.: winchest.958
Posted: Sun Nov 22 04:34:45 1987
Date-Received: Wed, 25-Nov-87 04:39:00 EST
References: <201@PT.CS.CMU.EDU> <388@sdcjove.CAM.UNISYS.COM>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 67

In article <380@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes:
>This discussion needs a new title...
>
>There are two reasons to share functional units.
> - cost, or, if you will, duty cycle.
> - simplicity ( in the sense of RISCness ).
>
>The duty cycle argument says that if a unit is rarely used, then you get a
>more effective design by sharing it among all the instruction-issue units.
>Note that a lot of the average Cray sits idle while the rest is being
>useful.  The counter-argument is that decreasing {prices, power consumption,
>etc} make sharing less of a win. Plus, sharing puts constraints on packaging
>- you have to get there from here.

>If you assume a single-chip CPU, I guess it's a bad idea.

That's the critical observation, and observe that an increasing piece
of the computing spectrum is being dominated by single-chip CPUs,
whose design tradeoffs are very different from having boards full of
[TTL, ECL, etc] logic.  For example, if you want to micro-time-slice N
processes, you must provide N sets of the highest-speed state in the
memory hierarchy [registers], and in fact, you'd probably want
N sets of caches also.  [Think about having N processes thrashing
around interleaved in the same cache: it is hard to see how this
will help your hit rates very much. TLBs likewise]  If you were building CPUs
that were multiple boards anyway, it might not be impossible to replicate
the registers without incurring awful speed penalties: there will be
a limit, but certainly, successful systems have been built this way,
if only to minimize context switching time. Board yields don't drop
like a stone just because you used a little more space.
On the other hand, if it's VLSI, you can be up against serious limits,
and you have to think hard about what's on the chips.

Finally, here are the reasons why the "single-chip" observation is
the critical one.  I might be accused of bias on the following conjectures,
but I don't think they're too far out of line:

1) Each year, an increasing proportion of newly-installed computers
(both units and $$) will be based on single-chip CPUs.

2) Single-chip solutions already dominate the low-end, and they keep
moving up.  The only way some of the existing architectures compete
there is by VLSIing as quickly as possible [microVAXen, for example].

3) Solutions that are not single-chip (or very small chip count)
will increasingly be:
	a) Highest-end supercomputers
	b) Upward extensions of existing product lines that didn't start life
	as single-chip CPUs
	c) "Unusual" architectures in the mini-super arena, which can often
	support anything if it solves some class of problem enough more
	cost-effectively than other available ones.

4) It's hard to believe there will be ANY more new computer architectures
in the low-to-mid range of computing that aren't single-chip VLSI micros.
(Oops: qualify that: SUCCESSFUL architectures).  Note that low-to-mid
range means shippable 10-mips uniprocessors in 1987, 20-mips in 1988,
>40 in 1989.

5) To summarize: for general-purpose computing, the time-slicing hardware
approach seems doomed to niches at best, because it runs right against
the likely design trends of the next few years.  This does leave the
question of identifying the niches that might be possible.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086