Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!think.com!spool.mu.edu!olivea!apple!voder!nsc!amdahl!JUTS!duts!kls30
From: kls30@duts.ccc.amdahl.com (Kent L Shephard)
Newsgroups: comp.sys.amiga.advocacy
Subject: Re: 680x0 vs 80x86
Message-ID: <fbH=02SF08zd01@JUTS.ccc.amdahl.com>
Date: 28 Jun 91 18:16:01 GMT
References: <92@ryptyde.UUCP> <4671.tnews@templar.actrix.gen.nz> <1154@stewart.UUCP> <1991Jun25.165516.13021@mintaka.lcs.mit.edu> <e3e502oG080e01@JUTS.ccc.amdahl.com> <1991Jun27.064123.27492@neon.Stanford.EDU>
Sender: netnews@ccc.amdahl.com
Reply-To: kls30@DUTS.ccc.amdahl.com (PUT YOUR NAME HERE)
Organization: Amdahl Corporation, Sunnyvale CA
Lines: 126

In article <1991Jun27.064123.27492@neon.Stanford.EDU> torrie@cs.stanford.edu (Evan Torrie) writes:
>kls30@duts.ccc.amdahl.com (Kent L Shephard) writes:
>
>>In article <1991Jun25.165516.13021@mintaka.lcs.mit.edu> rjc@churchy.gnu.ai.mit.edu (Ray Cromwell) writes:
>>>
>>>  But this is different, Jerry, because in this case the OS KNOWS
>>>how to clear caches. If a lot of MS-DOG programs used self-modifying
>>>programs, or if the OS itself doesn't know how to treat caches,
>>>code will break. Hence, Intel probably keeping I&D unified to avoid
>>>an MS-DOG nightmare.
>
>>Wrong.  Intel decided to go with a unified cache for one because it is
>>simpler to implement.  Also if you have a 4 way set assoc. cache you
>>have basically 4 small caches.  
>

According to Hennesy and Patterson - Computer Architecture a Quantitive
Approach, pgs 423-425.    Assuming 53% references for instructions, an 8k
unified cache vs a 4k instruction, 4k data cache; the results are as
follows.
Miss Rates
SIZE             Instruction only     Data only      Unified

8KB                  5.8%               6.8%          8.3%


You get an overall miss rate of  6.27% for data/instruction seperate.
You get an overall miss rate of  8.3% for unified.


>  But you still have only one internal path from the CPU to the cache, thus
>cutting your bandwidth in half vs a split I/D Harvard architecture.  For an
>example of why this is important, check out the parallelism in any of today's
>microprocessors' pipelines.  
>  Does Intel still use their 386 instruction prefetch buffer in the
>486?  I suppose that should shore up some of the performance loss from
>having a unified cache.
>

We know both companies claim a hit rate above 90% for their caches.
You also forget that replacement algor. makes a lot of difference in the
hit rate.   Also separate caches require that you have replacement algor.
for both caches.  You also need hardware for both control circuits.  You
need two sets of tag rams, etc.

Intel made a trade between 2-3% performance improvement vs. less chip area
and complexity of design.  They also got their product out the door a LOT
faster than Motorola.

>>Also in Intel processors you have
>>instuctions that have data included or immediately following.  Kind of
>>hard to separate data and instuctions.
>
>  An example of such an instruction?  The 68K has data in its instructions, 
>the ADDQ #x, Dn for example, but this doesn't stop an I/D cache (since the 
>data is non-modifiable).
>
>>Moto went with a seperate cache because the architecture is different.
>>The type of instructions are different.
>
>  Moto went with separate caches because of their performance.

Lets face it during design you make trade offs.  Intel made one Moto made
another.

>
>>As for self modifying code.  The machines that use Moto processors are
>>more guilty of this.  The Mac and Atari machines uses self modifying code
>>for copy protection.  When Moto started putting small caches on their
>>chips it created a nightmare.
>
>  So the copy-protection schemes don't use self-modifying code anymore
>[in fact, most Mac programs don't use copy-protection other than
>manual-type methods].
>  At least Motorola could do this, unlike Intel.

Mac programs now don't use self modifying code.  They did before and it
broke a lot of software when Moto started putting instruction and data
cache (small but there) on the 68k line of chips.

>
>>Self modifying code would have broken the 386 with cache.
>
>  Not with a unified cache.

Yes, with a unified cache you can break code that does weird things.  With
a separate cache you can break ill behaved code.

>
>>Also if a cache is designed properly it should be completly transparent to
>>software.
>
>  Transparent to user software, perhaps, but often the OS has
>to be intimately aware of the cache, just as it has to be aware of the 
>TLB.

The OS does not have to be aware of the cache unless it wants to turn it
on or off.  The CPU has to intimately know the cache.  The OS does not
need to know it is there.  The OS needs to know about the TLB because the
OS will handle page faults, loading descriptor tables, and just overall
hadling of virtual memory.

The OS knows nothing about a cache miss unless the page was swapped to
disk.  You would then get a page fault, bring the page into physical
memory, then the CPU would handle the cache miss.

A cache should be transparent if someone tell you otherwise they are
mistaken.  I've designed memory management and cache controller units.
The cache controller has always been transparent to the software.

Even in multiprocessor systems the cache is transparent.  You would use
a cache coherency protocol like MSI, MESI, MOESI, etc. and you would
impliment all your algor. in hardware.

>-- 
>------------------------------------------------------------------------------
>Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
>Murphy's Law of Intelism:  Just when you thought Intel had done everything
>possible to pervert the course of computer architecture, they bring out the 860


--
/*  -The opinions expressed are my own, not my employers.    */
/*      For I can only express my own opinions.              */
/*                                                           */
/*   Kent L. Shephard  : email - kls30@DUTS.ccc.amdahl.com   */