Path: utzoo!attcan!uunet!cs.utexas.edu!sdd.hp.com!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Instruction caches and closures Message-ID: <39979@mips.mips.COM> Date: 9 Jul 90 02:54:48 GMT References: <1990Jul7.041100.2413@xavax.com> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 59 In article <1990Jul7.041100.2413@xavax.com> alvitar@xavax.com (Phillip Harbison) writes: >In article <1756@charon.cwi.nl> jack@cwi.nl (Jack Jansen) writes: >> This I/D cache discussion sparked a thought: I get the impression that >> separate I/D caches are used mainly to get the benefit of two-way >This is not the reason for split I&D caches. If you think about it, >the split I&D caches require about as much hardware as any two-way >set associative cache (two sets of comparators, two sets of tag RAMS, >two data paths, two bus connections, etc.). There appears to be two >major reasons for using a split I&D cache. [1] It allows the CPU to >implement a Harvard architecture and enjoy the associated benefits, >i.e. twice as much memory bandwidth. [2] It allows each cache to be >tuned for performing a task. For example, large cache block sizes >may be very beneficial in an I-cache but less useful in a D-cache. >Also, the I-cache doesn't have to be writeable by the CPU. All of these are good comments. In addition: 1) In some cases, especially with an off-chip cache, the tag-comparison on a load or instruction fetch is in series with delivery to the processor. It is probably easier to hide this for an I-fetch than for a load instruction, by various trickery. However, in some designs, it can be part of the critical path for a load instruction, either: a) Adding a cycle to the load latency. OR b) Lenghtening the cycle time. ALl of this is easier to deal with, with on-chip caches. 2) With direct-mapped caches, it is sometimes easier to optimize the (most-common) cache-hit case, i.e., because you can send the data from cache -> pipeline, and be doing tag check and parity check in parallel, and then suppress the load and restart the pipeline as needed. 3) Depending on the nature of the cache, sometimes adirect-mapped cache can easily implement a 1-cycle store, which is difficult for a 2-set associative cache. Of course, for any store into a cache for which there is not a separate tag for the unit being stored, you're likely to pay 2 cycles or more anyway. Fortunately, some degree of write buffering helps mask all of this in almost any cache design. 4) See: Steven Przylblski, CACHE AND MEMORY HIERARCHY DESIGN, Morgan Kaufmann, Spring 1990, for thorough analyses of the tradeoffsa among cache designs, good references to important papers in this area. It especially talks about the cycle-time impacts of various design choices, or why: What is your cache-miss rate? is NOT a very useful question compared to What % of the total cycles are lost to the memory system? which is a better question. Of course, the best question is: How long does it take to run? because it is quite possible for a system to look WORSE on either of the measures above, if it turns out that a "more-efficient-looking" design lengthens the cycle time of the machine MORE than its efficiency improvement. -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086