Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!rutgers!apple!vsi1!wyse!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Barrel processors & string ops [really: Don't look back...]
Message-ID: <13582@winchester.mips.COM>
Date: 19 Feb 89 23:41:33 GMT
References: <747@atanasoff.cs.iastate.edu> <28200275@mcdurb> <4290@pt.cs.cmu.edu>
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 230

In article <4290@pt.cs.cmu.edu> shivers@centro.soar.cs.cmu.edu (Olin Shivers) writes:
>Andy Glew mentioned a barrel processor discussion, and says that
>mash@mips posted a good argument against them (barrel processors, not
>discussions). I would very much like to see that argument.

I don't have a copy of the original, but the argument follows two
inter-related areas: technical issues and business issues, and
can be summarized as follows:
	1) For technical reasons, it's more complicated to build VLSI
	micros as barrels.
	2) Cheap general-purpose chips tend to dominate special-purpose
	solutions, unless the special-purpose ones have substantial
	long-term cost or performance advantages.

Good background material could be found in:
	Bell, Mudge, McNamara, "COMPUTER ENGINEERING", a DEC View of
	Hardware Systems Design, 1978, Digital Press.
Specifically, read Chapter 1 "Seven Views of Computer Systems",
especially Views 3 and 4, and especially, Figure 7 on levels of integration.

Following is a (brief) technical argument, followed by a (long) business
argument that addresses a bunch of related issues that people have asked about.
Also, sorry if I step on anybody's toes; maybe this will stir up some
discussion.

1) Technical:
	The first-order determinant of CPU performance, for general purpose
	machines, is the aggregate bandwidth into the CPU, with about
	1 VUP ==(approx) 10 MB/sec [try this rule-of-thumb and see].
	Take the same technology and cache memory.  You can either build
	an N-way barrel processor, where each barrel slot generates B VUPs,
	or you can build 1 CPU that generates about N*B VUPs, because
	the basic hardware is running at the same speed.  The single CPU
	has to fight with latency issues that are avoided by the barrel,
	but the barrel:
	-wastes whole slots whenever there are less than N tasks available;
	-needs N copies of registers and state, in general, i.e., things that
	 are fast, and therefore expensive, if only in oppurtunity cost.
	-probably has worse cache behavior, in terms of the separate
	 tasks banging into each other more.
		OF COURSE, ALL THIS NEEDS QUANTIFICATION.
	The more you split the hardware apart [like separate caches],
	the closer you get to separate processors.
I think barrel designs might make more sense in board-level
implementations than they do in chip-level designs.  It is often less
expensive to replicate state in the former, and also, to afford
really wide busses all over the place.  Maybe it might make sense to
do a barrel design for 1 design round if you think you can get to VLSI
in the next.

Anyway, quite specifically, the detailed tradeoffs in building VLSI CPU chips
seems to argue against building them as barrels:
	I don't know of any existing popular CISC or RISC chips that are
	barrels; if anybody does, please point them out.
	Likewise, although this is harder data to know, the next round of
	chips is not likely to do this either: everybody is working on
	more integrated chips and things like super-scalar or super-pipelined
	designs, but the MIPS Competitive Intelligence Division has yet to turn
	up any barrel chips out there.  Maybe if we're all building
	2M-transistor chips, we'll find that we can't think of anything
	better to do, although I doubt it...

2) Business issues: (now, it gets long)
	The computer business is fundamentally different than it was even 10
years ago, basically because of the microprocessor.  Specifically, if you
are a systems designer, and if you choose to design your own CPUs, rather
than implement your system out of commercially-available micros, you'd
better have a Real Good reason, of which the following are some:
	a) You're building something that has to be binary-compatible with
	an existing line.  Your choice is either to build things out of
	gate-arrays, or semi-custom, or full-custom VLSI, in order of
	ascending cost and difficulty. [Gate-arrays: most supermini & mainframe
	vendors; full-custom VLSI: DEC CVAX.]
	b) You're a semiconductor vendor, also, and your business is building
	VLSI chips anyway. [Intel, Motorola, etc.]
	c) You're a system vendor who thinks they can design a CPU architecture
	and get it to be popular enough that it gets access to successive
	technologies that it stays with the leading edge of the technology
	curve in a cost-effective way [Sun & MIPS].
	d) You're building something whose performance or functionality
	cannot be done with the existing micros or next year's micros [CONVEX,
	CRAY]

However, if you're building something from scratch, it had better do something
a lot better than next year's micros, or you'll get run over from behind
by CPUs that have
	1) bottom-to-top range of applicability, not limited
	to a narrow price-performance niche,
	2) volume, and hence lower cost, and
	3) a bigger software base*, and
	4) more $$ coming in to fuel the next round of development to the
	next range of performance.
* Caveat: you do have to be careful that you don't just count # packages
available, but number of RELEVANT packages for the kinds of machines you're
building.  For example, ability to run MSDOS applications is a plus for
workstations, but probably not very relevant to somebody who wants a Convex,
so you can't compare architectures by counting applications.  Nevertheless,
application availability does count.

Not that many years ago, there used to be LOTS of companies who built
mini / supermini class machines out of TTL (and then maybe ECL).
You'd probably be surprised how many different proprietary minis have
been built: I looked at the DataPro research reports, 1987, and found about
50 different mini or supermini architectures [there used to be more].
Of these, some were produced by companies that have since disappeared;
many of them may never be upgraded; only a few are supported by companies
successful enough to make the continued enhancement worthwhile.

In the early 1980s, proprietary minis starting getting badly hurt by
the 16-bit micros, and low-end superminis were getting threatened by
32-bit micros.  Only a few mini/supermini vendors are left, really.
Of course, this is the second wave of this: consider the consolidations
in the companies building mainframes and others in the 1950s and 1960s...

OPINION, PERHAPS BIASED (REMEMBER WHERE I WORK): 
1) There exist VLSI RISCs in production that already show faster integer
and scalar FP performance than any of the popular superminis. Before the
end of the year, people will ship ECL VLSI RISCs at supermini prices,
whose corresponding uniprocessor performance is equivalent to Amdahl
5990s or IBM 3090s.  In addition, one should expect to see, during
1990-1992, CMOS or BiCMOS chips from which one can build 50-100 VUPs
machines (still uniprocessor).  There's no reason not to have a
1000-VUP multi in a large file-cabinet-size box by 1992 / 1993 (although
we'd sure better get some faster disks by then!) at costs competitive
with current superminis.

2) Most mini/supermini architectures born in the 1970s or early 1980s
are essentially doomed, unless they're owned by a company with strong
finances, a big customer base, or, perhaps, a customer base that's heavily
locked in for some reason or other.  Some of the older mainframe architectures
are also doomed, for the same reason.  [Note: doom doesn't mean they disappear
overnight, but that it gets harder and harder to justify upgrades, and if
a company takes the approach of relying only on its installed base of locked-in
customers, trouble is coming.]
	Now, this doesn't mean that the company owning those architectures
	is doomed.  Some mini companies have taken thoughtful and timely
	steps to adapt to the new technology without dumping their customers:
	HP would be a good example: think how long ago they saw the RISC stuff
	coming, and how much work went into assuring reasonable migration.
	Others have been working the problem as well; some have not, to the
	best of my knowledge, and I suspect they're going to get hurt.

3) Proprietary mini-supers are in serious danger in the next year or two:
one can already see the bloodbath going on there.  (Apologies to my
friends at various places), but it's hard to see why anybody but Convex
is really going to prosper and remain independent in this.
Note that Convex seems wisely to be taking the strategy of moving up
chasing supercomputers and staying out of the frenzy at the lower-end of
this market, which is, of course, the part starting to be attacked by
the VLSI RISCs.  I know this overlap is starting to happen, because we
(MIPS and some of its friends) are seeing a lot more competitive run-ins
with some of the mini-super guys.  We lose some (like: real vector problem,
need 1GB of memory, need some application that we don't have yet), but
we win some already on cost/performance, and sometimes even on performance.
An M/120 (a $30K thing in a PC/AT box) has been known to beat some
mini-supers in some kinds of big number-crunching benchmarks, and that is
Not Good News.... (well, it's good news for us...:-)

What happens in 1989/1990? Well, we expect to see the first VLSI ECL RISCs
appear, at least from us and Sun.  These things have got to be Bad News,
as they'll be in the 30-60 VUPs range, with reasonable scalar floating-point.
They're likely to be quite competitive (on a performance basis) with
many of the mini-supers, except in really heavy vector or vector-parallel
applications, and they'll probably win on cost/performance numbers in
even more cases, leaving a fairly narrow niche.
However, even worse is the software problem.   One of the biggest difficulties
for mini-supers is the difficulty of getting software on them: the machines
are expensive enough that you don't just leave them around at bunches of
3rd-party software developers.  BTW, 3rd-party developers are sane people,
and they don't port software for free, and they care about the number of
machines on which they can sell their software.  This makes it Real Hard
if you you only have a few hundred machines in the field, unless your machines
are among the few able to run the application. (Note how important it is to
be the first to get to a new zone of cost/performance, i.e., part of why
CRAY and Convex have been successful).
This is not a problem faced by the ECL RISCs,
which both already have large numbers of software-compatible machines out there.
To get a feeling for the scope of the problem, here are some numbers:
From COMPUTERWORLD, Feb 13, 1989, page 130 "High Performance Computers":
Minisuper installed base as of yearend 88 (Computer Technology Research Corp):
450	FPS
430	Convex
335	Alliant
110	Elxsi
 45	SCS
150??	Multiflow* (from a different source):
----
1520	TOTAL
This article didn't include Multiflow: CSN 2/13/89, p46. says "As of June
1988, Multiflow had sold 44 of its Trace computers.  Since then, the company
has stopped revealing how many systems it has sold, but Joseph Fisher,
co-founder and EVP, said the 4th and 3rd quarters generated the largest and
second-largest revenue for the company in its four-year history."
Assume the installed base in now 150 machines (probably optimistic).
(And of course, who know how accurate these numbers really are? However,
they're probably the right order of magnitude.  To be fair, the CW article
claimed minisupers were a real hot growth area, and I'm using the numbers
in the opposite direction....)

Now, MIPS and/or semiconductor partners have shipped about 20,000 chipsets,
as of YE1988.  Of course, many of them have gone into prototypes, or into
dedicated applications, or other things.  Still, MIPS itself built on
the order of 1000 machines, as well as a lot of boards that have gone into
others, and of course, some of our friends have shipped more MIPS-based
machines than we have.  Although I'm not privy to the numbers :-),
there must be 5-15K SPARC-based things out there, mostly in Sun-4s.
In late 1989,the mini-supers will have to face the spectre of competing with
fast and cost-effective machines whose CPU performance overlaps
at least the lower-middle of the minisuper performance
range, each of which has an installed base of lots of 10s of thousands,
low-end machines in the $10K range or lower, lots of software, and
little messing around to get reasonable performance.

Of course, CPU performance alone does not a minisuper make, and none of
this should be taken as disparagement of folks who work at any of these
companies, some of whom have built hardware or software that I respect
greatly.  All I suggest is that the old quote is appropriate:
	"Don't look back.  Something might be gaining on you."

To finish this long tome with the thing that started it: a barrel design
had better show some compelling and casting advantage over VLSI RISCs,
because it will probably be more expensive to build, and if it doesn't
get volume, business reality will make its life very hard.
Sorry for the length of this, but the topics have come up in a number of
side e-mail conversations, and it seemed to fit here.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086