Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!spool.mu.edu!agate!stanford.edu!neon.Stanford.EDU!torrie From: torrie@cs.stanford.edu (Evan Torrie) Newsgroups: comp.sys.amiga.advocacy Subject: Re: 680x0 vs 80x86 Message-ID: <1991Jun30.090351.22717@neon.Stanford.EDU> Date: 30 Jun 91 09:03:51 GMT Article-I.D.: neon.1991Jun30.090351.22717 References: <92@ryptyde.UUCP> <4671.tnews@templar.actrix.gen.nz> <1154@stewart.UUCP> <1991Jun25.165516.13021@mintaka.lcs.mit.edu> <1991Jun27.064123.27492@neon.Stanford.EDU> Sender: torrie@neon.Stanford.EDU (Evan James Torrie) Organization: Computer Science Department, Stanford University, Ca , USA Lines: 129 kls30@duts.ccc.amdahl.com (Kent L Shephard) writes: >According to Hennesy and Patterson - Computer Architecture a Quantitive >Approach, pgs 423-425. Assuming 53% references for instructions, an 8k >unified cache vs a 4k instruction, 4k data cache; the results are as >follows. >Miss Rates >SIZE Instruction only Data only Unified >8KB 5.8% 6.8% 8.3% >You get an overall miss rate of 6.27% for data/instruction seperate. >You get an overall miss rate of 8.3% for unified. No argument here so far. >> But you still have only one internal path from the CPU to the cache, thus >>cutting your bandwidth in half vs a split I/D Harvard architecture. For an >>example of why this is important, check out the parallelism in any of today's >>microprocessors' pipelines. >> Does Intel still use their 386 instruction prefetch buffer in the >>486? I suppose that should shore up some of the performance loss from >>having a unified cache. >> >We know both companies claim a hit rate above 90% for their caches. >You also forget that replacement algor. makes a lot of difference in the >hit rate. But both use LRU replacement, so they're equivalent. >Also separate caches require that you have replacement algor. >for both caches. You also need hardware for both control circuits. Yes, but this isn't much chip area compared to the actual cache storage. >You need two sets of tag rams, etc. But each cache is only 4K => the total # of tag rams is exactly the same in the 486's 8K cache vs the '040's 2x4K. >Intel made a trade between 2-3% performance improvement vs. less chip area >and complexity of design. Uhhh, sorry. This is where I violently disagree with you. You make the jump from miss ratio = 2-3% difference, to suddenly asserting that overall performance improvement is only 2-3%. If we read from H&P again, pg 423. "Unlike other levels of the memory hierarchy, caches are sometimes divided into instruction-only and data-only caches. Caches can contain that can contain either instructions or data are unified caches, or mixed caches. The CPU knows whether it is issuing an instruction address or a data address, so there can be separate ports for both, thereby doubling the bandwidth between the cache and the CPU. (Section 6.4 in Chapter 6 shows the advantages of dual memory ports for pipelined execution.) Separate caches also offers the opportunity of optimising each cache separately: different capacities, block sizes, and associativities may lead to better performance. SPLITTING THUS AFFECTS THE COST AND PERFORMANCE FAR BEYOND WHAT IS INDICATED BY THE CHANGE IN MISS RATES. [my emphasis]. >They also got their product out the door a LOT >faster than Motorola. But ended up being 20-25% slower. >> Moto went with separate caches because of their performance. >Lets face it during design you make trade offs. Intel made one Moto made >another. Most trade-offs involve a choice. For Intel, there was no such choice if they wanted to retain their captive market. >> >>>Self modifying code would have broken the 386 with cache. >> >> Not with a unified cache. >Yes, with a unified cache you can break code that does weird things. An example of such "weird things"? >With a separate cache you can break ill behaved code. Namely, self-modifying code. >The OS does not have to be aware of the cache unless it wants to turn it >on or off. The CPU has to intimately know the cache. The OS does not >need to know it is there. The OS needs to know about the TLB because the >OS will handle page faults, loading descriptor tables, and just overall >hadling of virtual memory. This depends on how simple your cache is. If it's some large second-level cache, then it's probably transparent to the OS. But, high speed, on-chip caches more often than not require the attention of the OS. For example, virtually addressed caches require flushes on context switches. Copy-back caches require special handling by the OS on I/O, shared memory operations (see H&P pg 467) In fact, with virtually addressed caches, a cache can even make its presence felt up in the programming languages/OS interface. See H&P pg 460 for an example. >A cache should be transparent if someone tell you otherwise they are >mistaken. H&P tells me that the OS often has to be aware of the cache. So are they mistaken? >I've designed memory management and cache controller units. >The cache controller has always been transparent to the software. But these are probably physically addressed, write through caches, right? Both of which can cause bottlenecks in a high performance cache design. >Even in multiprocessor systems the cache is transparent. You would use >a cache coherency protocol like MSI, MESI, MOESI, etc. and you would >impliment all your algor. in hardware. -- ------------------------------------------------------------------------------ Evan Torrie. Stanford University, Class of 199? torrie@cs.stanford.edu "I didn't get where I am today without knowing a good deal when I see one, Reggie." "Yes, C.J."