Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!prls!mips!mash From: mash@mips.UUCP (John Mashey) Newsgroups: comp.arch Subject: Re: Horizontal pipelining Message-ID: <958@winchester.UUCP> Date: Sun, 22-Nov-87 04:34:45 EST Article-I.D.: winchest.958 Posted: Sun Nov 22 04:34:45 1987 Date-Received: Wed, 25-Nov-87 04:39:00 EST References: <201@PT.CS.CMU.EDU> <388@sdcjove.CAM.UNISYS.COM> Reply-To: mash@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 67 In article <380@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: >This discussion needs a new title... > >There are two reasons to share functional units. > - cost, or, if you will, duty cycle. > - simplicity ( in the sense of RISCness ). > >The duty cycle argument says that if a unit is rarely used, then you get a >more effective design by sharing it among all the instruction-issue units. >Note that a lot of the average Cray sits idle while the rest is being >useful. The counter-argument is that decreasing {prices, power consumption, >etc} make sharing less of a win. Plus, sharing puts constraints on packaging >- you have to get there from here. >If you assume a single-chip CPU, I guess it's a bad idea. That's the critical observation, and observe that an increasing piece of the computing spectrum is being dominated by single-chip CPUs, whose design tradeoffs are very different from having boards full of [TTL, ECL, etc] logic. For example, if you want to micro-time-slice N processes, you must provide N sets of the highest-speed state in the memory hierarchy [registers], and in fact, you'd probably want N sets of caches also. [Think about having N processes thrashing around interleaved in the same cache: it is hard to see how this will help your hit rates very much. TLBs likewise] If you were building CPUs that were multiple boards anyway, it might not be impossible to replicate the registers without incurring awful speed penalties: there will be a limit, but certainly, successful systems have been built this way, if only to minimize context switching time. Board yields don't drop like a stone just because you used a little more space. On the other hand, if it's VLSI, you can be up against serious limits, and you have to think hard about what's on the chips. Finally, here are the reasons why the "single-chip" observation is the critical one. I might be accused of bias on the following conjectures, but I don't think they're too far out of line: 1) Each year, an increasing proportion of newly-installed computers (both units and $$) will be based on single-chip CPUs. 2) Single-chip solutions already dominate the low-end, and they keep moving up. The only way some of the existing architectures compete there is by VLSIing as quickly as possible [microVAXen, for example]. 3) Solutions that are not single-chip (or very small chip count) will increasingly be: a) Highest-end supercomputers b) Upward extensions of existing product lines that didn't start life as single-chip CPUs c) "Unusual" architectures in the mini-super arena, which can often support anything if it solves some class of problem enough more cost-effectively than other available ones. 4) It's hard to believe there will be ANY more new computer architectures in the low-to-mid range of computing that aren't single-chip VLSI micros. (Oops: qualify that: SUCCESSFUL architectures). Note that low-to-mid range means shippable 10-mips uniprocessors in 1987, 20-mips in 1988, >40 in 1989. 5) To summarize: for general-purpose computing, the time-slicing hardware approach seems doomed to niches at best, because it runs right against the likely design trends of the next few years. This does leave the question of identifying the niches that might be possible. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086