Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!samsung!usc!ucla-cs!oahu!marc From: marc@oahu.cs.ucla.edu (Marc Tremblay) Newsgroups: comp.arch Subject: Re: RISC vs CISC (rational discussion, not religious wars) Keywords: Die Space Message-ID: <29018@shemp.CS.UCLA.EDU> Date: 10 Nov 89 02:22:24 GMT References: <503@ctycal.UUCP> <15126@haddock.ima.isc.com> <28942@shemp.CS.UCLA.EDU> <31097@winchester.mips.COM> <28985@shemp.CS.UCLA.EDU> Sender: news@CS.UCLA.EDU Reply-To: marc@oahu.UUCP (Marc Tremblay) Organization: UCLA Computer Science Department Lines: 48 In article <28985@shemp.CS.UCLA.EDU> frazier@oahu.UUCP (Greg Frazier) writes: > (Some stuff on why an on-chip cache may not be a good idea) >Quick performance analysis hack: >on-chip $, assume 85% hit ratio, 1 cycle delay on hit, 14 cycle delay > on miss >off-chip $, assume 95% hit ratio, 2 cycle delay on hit, 14 cycle delay > on miss (a smart $ will not incur extra miss delay) > >on-chip memory speed: .85*1 + .15*14 = 2.95 cycles/reference >off-chip memory speed: .95*2 + .05*14 = 2.60 cycle/reference - a win! > >With the hit ratios I have assumed, the break-even point is a memory >delay of 10 cycles - below that, the on-chip cache becomes a win. Regarding data caches, most implementation with a 2-cycle load delay (on a hit) allow overlapping of instruction so that a decent compiler can schedule an instruction in the delay slot. If we assume that the load-delay can be filled let's say 50% of the time, we obtain a break-even point of around 6 for the miss delay, which makes the on-chip cache even more questionnable if *only* this factor is considered. To really evaluate the impact of an on-chip cache though we have to look at other factors such as: 1) With an on-chip cache it is a lot easier to implement a wide datapath (64 or 128 bits) between the cache and the register file than it is with an off-chip cache, which requires lots of pins and lots of routing. A wide datapath allows the use of instructions that take advantage of the extra bandwidth to i) save/restore the register file quicker (for example on calls/returns) and ii) load and store double precision operands in one cycle (two double precision operands can be loaded with 128 bits). 2) What applications is the processor-cache target for? For example if the chip is used mostly for applications showing lots of relatively small loops with heavy floating-point computations then an on-chip instruction cache makes a lot of sense since the hit ratio will be high. 3) Cost of flushing the cache on a context switch. Cost of maintaining cache coherency in a multiprocessor environment, etc... Marc Tremblay marc@CS.UCLA.EDU