Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker.mit.edu!shelby!neon!torrie From: torrie@cs.stanford.edu (Evan Torrie) Newsgroups: comp.arch Subject: Re: 68040 and caches Message-ID: <1991Feb27.082251.23059@Neon.Stanford.EDU> Date: 27 Feb 91 08:22:51 GMT References: <19330@cbmvax.commodore.com> Sender: torrie@Neon.Stanford.EDU (Evan James Torrie) Organization: Computer Science Department, Stanford University Lines: 56 jesup@cbmvax.commodore.com (Randell Jesup) writes: > Random question concerning the 68040: what do people think about >the utility/cost effectiveness/need for external caches (given that it >has ?4-way? associative 4K I and D caches internally and a single >external bus. I don't have any figures for the 68040, but for a very good explanation of the details and tradeoffs in cache design, take a look at Steven Przybylski's "Cache and Memory Hierarchy Design: A Performance-Directed Approach", Morgan Kaufman, 1990. There are, of course, many issues which would dictate whether adding a second-level cache is "worth it". Workload is a big factor - Unix type development environments are very different from personal computers. Cost also plays an important part. If you're striving for the last percentage point of performance, you can afford to spend a lot on the cache. If you're prepared to sacrifice peak speed in order to get a low cost machine (as it seems NeXT's designers have chosen), the 4K I/D caches are probably sufficient. Written from a MIPS Risc perspective, Pryzbylski suggests that 4K caches are far from optimal. He suggests 64K - 256K for an external cache, and argues a case for direct-mapped over set-associative caches. You mention the issue of code density on the 040 vs RISC type machines. I wonder if this will actually be less of a factor it is with the 68030. I believe it's true that the 040, taking it's RISC-like approach, is actually optimised for the very simple addressing modes, and will actually have an overall lower CPI if the code contains more of these simple instructions in place of a complicated 680x0 addressing mode instruction. Perhaps someone else can confirm this, along with whether this is being implemented in any 68040 specific compilers. > What about external caches on other CISC's, such as 68030's, x86's >(yech), etc? Certainly at some point you get insufficient gain for the >expense of adding more cache (I know, "insufficient" is a subjective term). >I'm interested in where people think the crossovers are (and I suppose for >RISC's too while we're at it). My opinion... For a Unix type workload, 64K-256K of cache is about where you should be now. For a PC type workload, 32K-64K. Apple seems to have done studies on its IIfx and IIci cache designs which indicate that anything more than 32K of cache for the Mac design is overkill. Other crossover points... set associativity of more than 2-4 is wasted. A block size of 4 words - 8 words is usually optimal. My $0.02 worth... -- ------------------------------------------------------------------------------ Evan Torrie. Stanford University, Class of 199? torrie@cs.stanford.edu "And in the death, as the last few corpses lay rotting in the slimy thoroughfare, the shutters lifted in inches, high on Poacher's Hill, and