Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!cs.utexas.edu!usc!orion.oac.uci.edu!uci-ics!ucla-cs!oahu!frazier From: frazier@oahu.cs.ucla.edu (Greg Frazier) Newsgroups: comp.arch Subject: Re: RISC vs CISC (rational discussion, not religious wars) Keywords: Die Space Message-ID: <28985@shemp.CS.UCLA.EDU> Date: 9 Nov 89 17:50:24 GMT References: <503@ctycal.UUCP> <15126@haddock.ima.isc.com> <28942@shemp.CS.UCLA.EDU> <31097@winchester.mips.COM> Sender: news@CS.UCLA.EDU Reply-To: frazier@oahu.UUCP (Greg Frazier) Organization: UCLA Computer Science Department Lines: 61 In article <31097@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >In article <28942@shemp.CS.UCLA.EDU> frazier@oahu.UUCP (Greg Frazier) writes: >... [me proposing multiple CPU chips] >I don't think we're anywhere near this yet, and this can be seen by >analyzing the layout and nature of million-transistor chips [like i860s]. >If you look at the i860 die, you find that: > a) Most of the transistors are in the caches. > b) Most of the space is the FPU, registers, integer datapath, etc. > Some of this stuff is wires, and it doesn't shrink as well as > transistors do. > c) At the top speed claimed for it, eventually [50Mhz], 12KB > of cache is NOWHERE near big enough for efficiency, by itself. [ further discussion of how the $ is still too small ] Yeah, I've been wondering why people are bothering with on-chip caches. Admittedly, a 2-chip CPU would probably be significantly more expensive than a single-chip CPU, but I think the product would be more flexible, and achieve better performance, if, instead of putting the $ on the CPU chip, the $ was provided on a companion chip. This should only increase the latency to the cache by a single clock cycle, while significantly boosting the hit ratio. Particularly if this were a data $ (leave the instruction $ on chip - it belongs there). With a separate cache chip, one could a) put some intelligence on it and/or b) make it expandable, such that a user could choose to provide two or three $ chips. Of course, this is not a terribly new idea - but I don't know why it isn't being used in the newer chips. Realtime people would love it - they would put NO $ chips on, and simply populate the board with fast local memory (almost the same thing, but more predictable behavior). This scheme would provide more space on chip for... two cpu's, or multiple FPU's, or single-cycle FPU's (that's an attractive one!), or any other of a host of performance boosting schemes. A disadvantage is that the CPU chip would have to have multiple ports - a chip with inst. $ and data $ on chip can have both $'s share a single port to memory, since (presumably) neither is using it very often. However, unless one wants 128 bit paths to memory (which one might), I don't think having 2 ports is a big deal. Quick performance analysis hack: on-chip $, assume 85% hit ratio, 1 cycle delay on hit, 14 cycle delay on miss off-chip $, assume 95% hit ratio, 2 cycle delay on hit, 14 cycle delay on miss (a smart $ will not incur extra miss delay) on-chip memory speed: .85*1 + .15*14 = 2.95 cycles/reference off-chip memory speed: .95*2 + .05*14 = 2.60 cycle/reference - a win! With the hit ratios I have assumed, the break-even point is a memory delay of 10 cycles - below that, the on-chip cache becomes a win. Of course, change the hit ratios, and one changes the break-even point, so your mileage will vary. As a final note, if the $ chip is closely married to the CPU chip, there is no reason why the 2 cycle dealay can't be achieved, I think. Greg Frazier "They thought to use and shame me but I win out by nature, because a true freak cannot be made. A true freak must be born." - Geek Love Greg Frazier frazier@CS.UCLA.EDU !{ucbvax,rutgers}!ucla-cs!frazier