Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!spool.mu.edu!agate!stanford.edu!neon.Stanford.EDU!torrie
From: torrie@cs.stanford.edu (Evan Torrie)
Newsgroups: comp.sys.amiga.advocacy
Subject: Re: 680x0 vs 80x86
Message-ID: <1991Jun30.090351.22717@neon.Stanford.EDU>
Date: 30 Jun 91 09:03:51 GMT
Article-I.D.: neon.1991Jun30.090351.22717
References: <92@ryptyde.UUCP> <4671.tnews@templar.actrix.gen.nz> <1154@stewart.UUCP> <1991Jun25.165516.13021@mintaka.lcs.mit.edu> <e3e502oG080e01@JUTS.ccc.amdahl.com> <1991Jun27.064123.27492@neon.Stanford.EDU> <fbH=02SF08zd01@JUTS.ccc.amdahl.com>
Sender: torrie@neon.Stanford.EDU (Evan James Torrie)
Organization: Computer Science Department, Stanford University, Ca , USA
Lines: 129

kls30@duts.ccc.amdahl.com (Kent L Shephard) writes:

>According to Hennesy and Patterson - Computer Architecture a Quantitive
>Approach, pgs 423-425.    Assuming 53% references for instructions, an 8k
>unified cache vs a 4k instruction, 4k data cache; the results are as
>follows.
>Miss Rates
>SIZE             Instruction only     Data only      Unified
>8KB                  5.8%               6.8%          8.3%


>You get an overall miss rate of  6.27% for data/instruction seperate.
>You get an overall miss rate of  8.3% for unified.

 No argument here so far.

>>  But you still have only one internal path from the CPU to the cache, thus
>>cutting your bandwidth in half vs a split I/D Harvard architecture.  For an
>>example of why this is important, check out the parallelism in any of today's
>>microprocessors' pipelines.  
>>  Does Intel still use their 386 instruction prefetch buffer in the
>>486?  I suppose that should shore up some of the performance loss from
>>having a unified cache.
>>

>We know both companies claim a hit rate above 90% for their caches.
>You also forget that replacement algor. makes a lot of difference in the
>hit rate.   

  But both use LRU replacement, so they're equivalent.

>Also separate caches require that you have replacement algor.
>for both caches.  You also need hardware for both control circuits.

  Yes, but this isn't much chip area compared to the actual cache
storage.

>You need two sets of tag rams, etc.

  But each cache is only 4K => the total # of tag rams is exactly the same
in the 486's 8K cache vs the '040's 2x4K.

>Intel made a trade between 2-3% performance improvement vs. less chip area
>and complexity of design.  

  Uhhh, sorry.  This is where I violently disagree with you.  You make the
jump from miss ratio = 2-3% difference, to suddenly asserting that overall
performance improvement is only 2-3%.
  If we read from H&P again, pg 423.

"Unlike other levels of the memory hierarchy, caches are sometimes divided 
into instruction-only and data-only caches.  Caches can contain that
can contain either instructions or data are unified caches, or mixed
caches.  The CPU knows whether it is issuing an instruction address or
a data address, so there can be separate ports for both, thereby
doubling the bandwidth between the cache and the CPU.  (Section 6.4 in
Chapter 6 shows the advantages of dual memory ports for pipelined
execution.)  Separate caches also offers the opportunity of optimising
each cache separately: different capacities, block sizes, and
associativities may lead to better performance.  SPLITTING THUS
AFFECTS THE COST AND PERFORMANCE FAR BEYOND WHAT IS INDICATED BY THE
CHANGE IN MISS RATES.
[my emphasis].

>They also got their product out the door a LOT
>faster than Motorola.

  But ended up being 20-25% slower.

>>  Moto went with separate caches because of their performance.

>Lets face it during design you make trade offs.  Intel made one Moto made
>another.

  Most trade-offs involve a choice.  For Intel, there was no such
choice if they wanted to retain their captive market.

>>
>>>Self modifying code would have broken the 386 with cache.
>>
>>  Not with a unified cache.

>Yes, with a unified cache you can break code that does weird things. 

  An example of such "weird things"?  

>With a separate cache you can break ill behaved code.

  Namely, self-modifying code.

>The OS does not have to be aware of the cache unless it wants to turn it
>on or off.  The CPU has to intimately know the cache.  The OS does not
>need to know it is there.  The OS needs to know about the TLB because the
>OS will handle page faults, loading descriptor tables, and just overall
>hadling of virtual memory.

  This depends on how simple your cache is.  If it's some large
second-level cache, then it's probably transparent to the OS.  But, 
high speed, on-chip caches more often than not require the attention
of the OS.  For example, virtually addressed caches require flushes on
context switches.  Copy-back caches require special handling by the
OS on I/O, shared memory operations (see H&P pg 467)
  In fact, with virtually addressed caches, a cache can even make its
presence felt up in the programming languages/OS interface.  See 
H&P pg 460 for an example.

>A cache should be transparent if someone tell you otherwise they are
>mistaken.  

  H&P tells me that the OS often has to be aware of the cache.  So
are they mistaken?

>I've designed memory management and cache controller units.
>The cache controller has always been transparent to the software.

  But these are probably physically addressed, write through caches, 
right?  Both of which can cause bottlenecks in a high performance
cache design.

>Even in multiprocessor systems the cache is transparent.  You would use
>a cache coherency protocol like MSI, MESI, MOESI, etc. and you would
>impliment all your algor. in hardware.


-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
"I didn't get where I am today without knowing a good deal when I see one,
 Reggie."  "Yes, C.J."