Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!rutgers!apple!vsi1!wyse!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Barrel processors & string ops [really: Don't look back...] Message-ID: <13582@winchester.mips.COM> Date: 19 Feb 89 23:41:33 GMT References: <747@atanasoff.cs.iastate.edu> <28200275@mcdurb> <4290@pt.cs.cmu.edu> Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 230 In article <4290@pt.cs.cmu.edu> shivers@centro.soar.cs.cmu.edu (Olin Shivers) writes: >Andy Glew mentioned a barrel processor discussion, and says that >mash@mips posted a good argument against them (barrel processors, not >discussions). I would very much like to see that argument. I don't have a copy of the original, but the argument follows two inter-related areas: technical issues and business issues, and can be summarized as follows: 1) For technical reasons, it's more complicated to build VLSI micros as barrels. 2) Cheap general-purpose chips tend to dominate special-purpose solutions, unless the special-purpose ones have substantial long-term cost or performance advantages. Good background material could be found in: Bell, Mudge, McNamara, "COMPUTER ENGINEERING", a DEC View of Hardware Systems Design, 1978, Digital Press. Specifically, read Chapter 1 "Seven Views of Computer Systems", especially Views 3 and 4, and especially, Figure 7 on levels of integration. Following is a (brief) technical argument, followed by a (long) business argument that addresses a bunch of related issues that people have asked about. Also, sorry if I step on anybody's toes; maybe this will stir up some discussion. 1) Technical: The first-order determinant of CPU performance, for general purpose machines, is the aggregate bandwidth into the CPU, with about 1 VUP ==(approx) 10 MB/sec [try this rule-of-thumb and see]. Take the same technology and cache memory. You can either build an N-way barrel processor, where each barrel slot generates B VUPs, or you can build 1 CPU that generates about N*B VUPs, because the basic hardware is running at the same speed. The single CPU has to fight with latency issues that are avoided by the barrel, but the barrel: -wastes whole slots whenever there are less than N tasks available; -needs N copies of registers and state, in general, i.e., things that are fast, and therefore expensive, if only in oppurtunity cost. -probably has worse cache behavior, in terms of the separate tasks banging into each other more. OF COURSE, ALL THIS NEEDS QUANTIFICATION. The more you split the hardware apart [like separate caches], the closer you get to separate processors. I think barrel designs might make more sense in board-level implementations than they do in chip-level designs. It is often less expensive to replicate state in the former, and also, to afford really wide busses all over the place. Maybe it might make sense to do a barrel design for 1 design round if you think you can get to VLSI in the next. Anyway, quite specifically, the detailed tradeoffs in building VLSI CPU chips seems to argue against building them as barrels: I don't know of any existing popular CISC or RISC chips that are barrels; if anybody does, please point them out. Likewise, although this is harder data to know, the next round of chips is not likely to do this either: everybody is working on more integrated chips and things like super-scalar or super-pipelined designs, but the MIPS Competitive Intelligence Division has yet to turn up any barrel chips out there. Maybe if we're all building 2M-transistor chips, we'll find that we can't think of anything better to do, although I doubt it... 2) Business issues: (now, it gets long) The computer business is fundamentally different than it was even 10 years ago, basically because of the microprocessor. Specifically, if you are a systems designer, and if you choose to design your own CPUs, rather than implement your system out of commercially-available micros, you'd better have a Real Good reason, of which the following are some: a) You're building something that has to be binary-compatible with an existing line. Your choice is either to build things out of gate-arrays, or semi-custom, or full-custom VLSI, in order of ascending cost and difficulty. [Gate-arrays: most supermini & mainframe vendors; full-custom VLSI: DEC CVAX.] b) You're a semiconductor vendor, also, and your business is building VLSI chips anyway. [Intel, Motorola, etc.] c) You're a system vendor who thinks they can design a CPU architecture and get it to be popular enough that it gets access to successive technologies that it stays with the leading edge of the technology curve in a cost-effective way [Sun & MIPS]. d) You're building something whose performance or functionality cannot be done with the existing micros or next year's micros [CONVEX, CRAY] However, if you're building something from scratch, it had better do something a lot better than next year's micros, or you'll get run over from behind by CPUs that have 1) bottom-to-top range of applicability, not limited to a narrow price-performance niche, 2) volume, and hence lower cost, and 3) a bigger software base*, and 4) more $$ coming in to fuel the next round of development to the next range of performance. * Caveat: you do have to be careful that you don't just count # packages available, but number of RELEVANT packages for the kinds of machines you're building. For example, ability to run MSDOS applications is a plus for workstations, but probably not very relevant to somebody who wants a Convex, so you can't compare architectures by counting applications. Nevertheless, application availability does count. Not that many years ago, there used to be LOTS of companies who built mini / supermini class machines out of TTL (and then maybe ECL). You'd probably be surprised how many different proprietary minis have been built: I looked at the DataPro research reports, 1987, and found about 50 different mini or supermini architectures [there used to be more]. Of these, some were produced by companies that have since disappeared; many of them may never be upgraded; only a few are supported by companies successful enough to make the continued enhancement worthwhile. In the early 1980s, proprietary minis starting getting badly hurt by the 16-bit micros, and low-end superminis were getting threatened by 32-bit micros. Only a few mini/supermini vendors are left, really. Of course, this is the second wave of this: consider the consolidations in the companies building mainframes and others in the 1950s and 1960s... OPINION, PERHAPS BIASED (REMEMBER WHERE I WORK): 1) There exist VLSI RISCs in production that already show faster integer and scalar FP performance than any of the popular superminis. Before the end of the year, people will ship ECL VLSI RISCs at supermini prices, whose corresponding uniprocessor performance is equivalent to Amdahl 5990s or IBM 3090s. In addition, one should expect to see, during 1990-1992, CMOS or BiCMOS chips from which one can build 50-100 VUPs machines (still uniprocessor). There's no reason not to have a 1000-VUP multi in a large file-cabinet-size box by 1992 / 1993 (although we'd sure better get some faster disks by then!) at costs competitive with current superminis. 2) Most mini/supermini architectures born in the 1970s or early 1980s are essentially doomed, unless they're owned by a company with strong finances, a big customer base, or, perhaps, a customer base that's heavily locked in for some reason or other. Some of the older mainframe architectures are also doomed, for the same reason. [Note: doom doesn't mean they disappear overnight, but that it gets harder and harder to justify upgrades, and if a company takes the approach of relying only on its installed base of locked-in customers, trouble is coming.] Now, this doesn't mean that the company owning those architectures is doomed. Some mini companies have taken thoughtful and timely steps to adapt to the new technology without dumping their customers: HP would be a good example: think how long ago they saw the RISC stuff coming, and how much work went into assuring reasonable migration. Others have been working the problem as well; some have not, to the best of my knowledge, and I suspect they're going to get hurt. 3) Proprietary mini-supers are in serious danger in the next year or two: one can already see the bloodbath going on there. (Apologies to my friends at various places), but it's hard to see why anybody but Convex is really going to prosper and remain independent in this. Note that Convex seems wisely to be taking the strategy of moving up chasing supercomputers and staying out of the frenzy at the lower-end of this market, which is, of course, the part starting to be attacked by the VLSI RISCs. I know this overlap is starting to happen, because we (MIPS and some of its friends) are seeing a lot more competitive run-ins with some of the mini-super guys. We lose some (like: real vector problem, need 1GB of memory, need some application that we don't have yet), but we win some already on cost/performance, and sometimes even on performance. An M/120 (a $30K thing in a PC/AT box) has been known to beat some mini-supers in some kinds of big number-crunching benchmarks, and that is Not Good News.... (well, it's good news for us...:-) What happens in 1989/1990? Well, we expect to see the first VLSI ECL RISCs appear, at least from us and Sun. These things have got to be Bad News, as they'll be in the 30-60 VUPs range, with reasonable scalar floating-point. They're likely to be quite competitive (on a performance basis) with many of the mini-supers, except in really heavy vector or vector-parallel applications, and they'll probably win on cost/performance numbers in even more cases, leaving a fairly narrow niche. However, even worse is the software problem. One of the biggest difficulties for mini-supers is the difficulty of getting software on them: the machines are expensive enough that you don't just leave them around at bunches of 3rd-party software developers. BTW, 3rd-party developers are sane people, and they don't port software for free, and they care about the number of machines on which they can sell their software. This makes it Real Hard if you you only have a few hundred machines in the field, unless your machines are among the few able to run the application. (Note how important it is to be the first to get to a new zone of cost/performance, i.e., part of why CRAY and Convex have been successful). This is not a problem faced by the ECL RISCs, which both already have large numbers of software-compatible machines out there. To get a feeling for the scope of the problem, here are some numbers: From COMPUTERWORLD, Feb 13, 1989, page 130 "High Performance Computers": Minisuper installed base as of yearend 88 (Computer Technology Research Corp): 450 FPS 430 Convex 335 Alliant 110 Elxsi 45 SCS 150?? Multiflow* (from a different source): ---- 1520 TOTAL This article didn't include Multiflow: CSN 2/13/89, p46. says "As of June 1988, Multiflow had sold 44 of its Trace computers. Since then, the company has stopped revealing how many systems it has sold, but Joseph Fisher, co-founder and EVP, said the 4th and 3rd quarters generated the largest and second-largest revenue for the company in its four-year history." Assume the installed base in now 150 machines (probably optimistic). (And of course, who know how accurate these numbers really are? However, they're probably the right order of magnitude. To be fair, the CW article claimed minisupers were a real hot growth area, and I'm using the numbers in the opposite direction....) Now, MIPS and/or semiconductor partners have shipped about 20,000 chipsets, as of YE1988. Of course, many of them have gone into prototypes, or into dedicated applications, or other things. Still, MIPS itself built on the order of 1000 machines, as well as a lot of boards that have gone into others, and of course, some of our friends have shipped more MIPS-based machines than we have. Although I'm not privy to the numbers :-), there must be 5-15K SPARC-based things out there, mostly in Sun-4s. In late 1989,the mini-supers will have to face the spectre of competing with fast and cost-effective machines whose CPU performance overlaps at least the lower-middle of the minisuper performance range, each of which has an installed base of lots of 10s of thousands, low-end machines in the $10K range or lower, lots of software, and little messing around to get reasonable performance. Of course, CPU performance alone does not a minisuper make, and none of this should be taken as disparagement of folks who work at any of these companies, some of whom have built hardware or software that I respect greatly. All I suggest is that the old quote is appropriate: "Don't look back. Something might be gaining on you." To finish this long tome with the thing that started it: a barrel design had better show some compelling and casting advantage over VLSI RISCs, because it will probably be more expensive to build, and if it doesn't get volume, business reality will make its life very hard. Sorry for the length of this, but the topics have come up in a number of side e-mail conversations, and it seemed to fit here. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086