Path: utzoo!attcan!uunet!cs.utexas.edu!sdd.hp.com!mips!winchester!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Instruction caches and closures
Message-ID: <39979@mips.mips.COM>
Date: 9 Jul 90 02:54:48 GMT
References: <1990Jul7.041100.2413@xavax.com>
Sender: news@mips.COM
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Inc.
Lines: 59

In article <1990Jul7.041100.2413@xavax.com> alvitar@xavax.com (Phillip Harbison) writes:
>In article <1756@charon.cwi.nl> jack@cwi.nl (Jack Jansen) writes:
>> This I/D cache discussion sparked a thought: I get the impression that
>> separate I/D caches are used mainly to get the benefit of two-way

>This is not the reason for split I&D caches.  If you think about it,
>the split I&D caches require about as much hardware as any two-way
>set associative cache (two sets of comparators, two sets of tag RAMS,
>two data paths, two bus connections, etc.).  There appears to be two
>major reasons for using a split I&D cache.  [1] It allows the CPU to
>implement a Harvard architecture and enjoy the associated benefits,
>i.e. twice as much memory bandwidth.  [2] It allows each cache to be
>tuned for performing a task.  For example, large cache block sizes
>may be very beneficial in an I-cache but less useful in a D-cache.
>Also, the I-cache doesn't have to be writeable by the CPU.

All of these are good comments.  In addition:
1) In some cases, especially with an off-chip cache, the tag-comparison
on a load or instruction fetch is in series with delivery to the
processor.  It is probably easier to hide this for an I-fetch than for
a load instruction, by various trickery.  However, in some designs,
it can be part of the critical path for a load instruction, either:
	a) Adding a cycle to the load latency.
	OR
	b) Lenghtening the cycle time.
ALl of this is easier to deal with, with on-chip caches.
2) With direct-mapped caches, it is sometimes easier to optimize the
(most-common) cache-hit case, i.e., because you can send the data
from cache -> pipeline, and be doing tag check and parity check in
parallel, and then suppress the load and restart the pipeline as needed.

3) Depending on the nature of the cache, sometimes adirect-mapped cache
can easily implement a 1-cycle store, which is difficult for a 2-set associative
cache.  Of course, for any store into a cache for which there is not
a separate tag for the unit being stored, you're likely to pay 2 cycles or
more anyway.  Fortunately, some degree of write buffering helps mask all of
this in almost any cache design.

4) See: Steven Przylblski, CACHE AND MEMORY HIERARCHY DESIGN,
Morgan Kaufmann, Spring 1990, for thorough analyses of the tradeoffsa
among cache designs, good references to important papers in this area.
It especially talks about the cycle-time impacts of various design choices,
or why:
	What is your cache-miss rate?
			is NOT a very useful question compared to
	What % of the total cycles are lost to the memory system?
			which is  a better question.
Of course, the best question is:
	How long does it take to run?
		because it is quite possible for a system to look WORSE
		on either of the measures above, if it turns out that
		a "more-efficient-looking" design lengthens the
		cycle time of the machine MORE than its efficiency
		improvement.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086