Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!brutus.cs.uiuc.edu!apple!vsi1!daver!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Error in Posting of SPEC numbers on IBM systems Message-ID: <36426@mips.mips.COM> Date: 24 Feb 90 02:41:25 GMT References: <36189@mips.mips.COM> <14900004@hpdmd48.HP.COM> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 92 In article <14900004@hpdmd48.HP.COM> sritacco@hpdmd48.HP.COM (Steve Ritacco) writes: >Ok, let's talk some architecture stuff. >Why is it that the RIOS has a bigger data cache than instruction cache? >This defies conventional wisdom. Data caches are less effective than >instruction caches and are usually made small because their hit ratio >doesn't increase with size as rapidly as instruction cache. If I had to >guess what is going on, I would guess that access to the I-cache is >very wide, to suport super-scaler, so they crammed all they could on >the CPU chip. This seems to be pretty effective. IBM has shown the >super-scaler architecture works, which up to this point I wasn't convinced >of. The benifits are tangible. A 20MgHz CPU with 8K I-cache and 32K >D-cache SPECmarked at 22.something. That is quite impressive. The R3000 >which to date seemed the most efficient CPU/system implementation has >been displaced for the moment. 1) Super-scalar works, at least for getting at more of the low-level parallism of FP code. This is clearly shown by the IBM systems. 2) It's not clear that it works, or their specific case works. This might be: a) Compilers will get better (likely) for integer. b) Compilers will get better, a lot (unlikely) for integer. It is alwasy possible that there's a whole lot of mileage to be gained, but past experience says to doubt it; it's not like this compiler technology is a raw new technology: IBM has been doing excellent optimization for a long time. Certainly, our experience has been that most of the micro-level scheduling improvements over the last few years ahve been more in the FP area, than in the integer area. Anyway, I'd council keeping an open mind, but I'd also advise not just believing what some IBM marketing guy says "Oh, we haven't really taken advantage of that.", that it's going to get magically better. On the other hand, if one of their good technical folks like Marty Hopkins says there's a big jump coming, then one should pay serious attention. 3) In general, this does raise an interesting issue, which is comparing cache sizes. Some people build smaller, special-purpose, N-way-set-associative caches; some people build various-sized, direct-mapped caches from standard SRAM. Both ways are legitimate, and there are various tradeoffs in terms of power, space, and cost. One thing to be careful off is saying that something did it with a small cache, because one would also want to know how much a special-purpose cache chip costs, also..... I don't claim to be unbiased: I like using standard SRAMs, because they always get cheap, and so far, I think that chip costs argue with me, but there are legitimate reasons for doing it the other way, too. Of course, we're not likely to know for sure the cost of the IBM chips, so it's not so easy to compare. Note, just in case anyone is misled by the following, there is no announced product from MIPS called an R4000.... >I don't doubt that the R4000 will out-perform RIOS, there is one thing >that I wonder about though. Will the R4000 out-perform RIOS brute force >that is high integration and very high clock speeds, or will it beat >it by providing greater architectual efficiency? The only SPARC imple >mentations that beat mips have a major clock speed difference (not to >mention higher implemenatation cost). One other concern is the hype >associated with super-scaler. In the EE-Times article someone from >mips stated that the R4000 was going to be super-scaler. That's the As far as I know, no one from MIPS, who knows, has ever publicly said that it would be super-scalar (or that it wouldn't). What we generally say is: there are various flavors of multiple-issue machines: superscalar, superpipelined, and VLIW (or maybe, short VLIW, which is what I'd call the i860 and DN10000, sort of), and that anybody who wants to be competitive in the current round of chips needs to do one of these, and that of course we've been working on this for several years, and are familiar with the variations, and are doing one, or some combination, but that we explicitly refuse to disclose which flavor we're using in the R?000. At some panel session within the last year or so, I commented that architectural simulation is a necessity, because this area was far beyond human intuition. I said, for instance, that we'd burned huge numbers of cycles simulating the effects of being able to do various pairs of instructions simultaneously, and comparing results, and that we'd been thinking about "supersonic" pipelines for years. Actually, what I think has happened is that super-scalar has become like RISC. I.e., for a while, lots of people got convinced that if something had register windows, that was RISC. Right now, almost anything with an aggressive pipeline gets called super-scalar, because for most people, the implementation nuances are irrelevant. >first time I had heard that. Made me wonder if it is true, or just an >attempt to ride the hype wave. If better performance can be had with >a simpler design (non super-scaler) due to less complexity, why not tell As usual, I recommend Hennessy's article in the September 89 UNIX Review; it also contians some references to other articles with good studies of super-scalar issues. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086