Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!samsung!usc!ucla-cs!oahu!marc
From: marc@oahu.cs.ucla.edu (Marc Tremblay)
Newsgroups: comp.arch
Subject: Re: RISC vs CISC (rational discussion, not religious wars)
Keywords: Die Space
Message-ID: <29018@shemp.CS.UCLA.EDU>
Date: 10 Nov 89 02:22:24 GMT
References: <503@ctycal.UUCP> <15126@haddock.ima.isc.com> <28942@shemp.CS.UCLA.EDU> <31097@winchester.mips.COM> <28985@shemp.CS.UCLA.EDU>
Sender: news@CS.UCLA.EDU
Reply-To: marc@oahu.UUCP (Marc Tremblay)
Organization: UCLA Computer Science Department
Lines: 48

In article <28985@shemp.CS.UCLA.EDU> frazier@oahu.UUCP (Greg Frazier) writes:
> (Some stuff on why an on-chip cache may not be a good idea)
>Quick performance analysis hack:
>on-chip $, assume 85% hit ratio, 1 cycle delay on hit, 14 cycle delay
>	on miss
>off-chip $, assume 95% hit ratio, 2 cycle delay on hit, 14 cycle delay
>	on miss (a smart $ will not incur extra miss delay)
>
>on-chip memory speed: .85*1 + .15*14 = 2.95 cycles/reference
>off-chip memory speed: .95*2 + .05*14 = 2.60 cycle/reference - a win!
>
>With the hit ratios I have assumed, the break-even point is a memory
>delay of 10 cycles - below that, the on-chip cache becomes a win.

Regarding data caches, most implementation with a 2-cycle load delay
(on a hit) allow overlapping of instruction so that a decent compiler 
can schedule an instruction in the delay slot.
If we assume that the load-delay can be filled let's say 50% of the time,
we obtain a break-even point of around 6 for the miss delay,
which makes the on-chip cache even more questionnable if *only*
this factor is considered.

To really evaluate the impact of an on-chip cache though we have 
to look at other factors such as:

	1) With an on-chip cache it is a lot easier to implement 
	   a wide datapath (64 or 128 bits) between the cache and 
	   the register file than it is with an off-chip cache,
	   which requires lots of pins and lots of routing.
	   A wide datapath allows the use of instructions that
	   take advantage of the extra bandwidth to 
	   i) save/restore the register file quicker (for example 
	   on calls/returns) and
	   ii) load and store double precision operands in one cycle
	   (two double precision operands can be loaded with 128 bits).

	2) What applications is the processor-cache target for?
	   For example if the chip is used mostly for applications 
	   showing lots of relatively small loops with heavy
	   floating-point computations then an on-chip instruction
	   cache makes a lot of sense since the hit ratio will be high.

	3) Cost of flushing the cache on a context switch.
	   Cost of maintaining cache coherency in a multiprocessor
	   environment, etc...

					Marc Tremblay
					marc@CS.UCLA.EDU