Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site sauron.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!qantel!lll-crg!lll-lcc!vecpyr!amd!pesnta!pyramid!gould9!ncr-sd!ncrcae!sauron!campbell
From: campbell@sauron.UUCP (Mark Campbell)
Newsgroups: net.arch
Subject: Re: 11/08/85 Dhrystone Benchmark Results
Message-ID: <594@sauron.UUCP>
Date: Mon, 25-Nov-85 13:38:58 EST
Article-I.D.: sauron.594
Posted: Mon Nov 25 13:38:58 1985
Date-Received: Fri, 29-Nov-85 21:28:23 EST
References: <1129@hou2h.UUCP> <643@cornell.UUCP> <340@ncr-sd.UUCP>
Reply-To: campbell@sauron.UUCP (Mark Campbell)
Distribution: net.arch
Organization: NCR Corp., Advanced System Development, Columbia, SC
Lines: 55
Keywords: dhrystone, cache

In article <340@ncr-sd.UUCP> stubbs@ncr-sd.UUCP (0000-Jan Stubbs) writes:
>In article <643@cornell.UUCP> jqj@cornell.UUCP (J Q Johnson) writes:
>>How much does a typical cache architecture (say a 4K 2-way associative
>>cache, or the onboard cache on a 68020) effect Dhrystone performance?
>>
>I have found the 256 byte instruction cache on the 68020 to have small impact
>on dhrystone performance ( <10% ) and in any other larger benchmark program.
>This is easy to measure as you can turn off the cache. Much more important are 
>off chip data and instruction caches if large enough.

It is difficult to evaluate the effects of cache architecture with respect to
the performance of a system without taking into account the implementation
of that architecture and the implementation of the rest of the system.  The
MC68020 internal (on-chip) cache is an excellent example: the implementation
details of the system in which the MC68020 resides is extremely important.

I also executed the Dhrystone benchmark on an MC68020-based system with the
internal cache disabled and enabled.  I found the average performance
degradation due to the disabling of the internal cache to be a minimum of
15%, with an average degradation of 18% (from 4545 to 3846).  Some details
of the system on which I obtained these numbers are given below:

	Machine:		NCR Tower 32
	Clock Rate:		16.67 MHz
	External Caches:	0 Wait-State, 6K Direct-Mapped Program Cache
				1 Wait-State, 2K Direct-Mapped Data Cache

Given these results, I have a problem understanding the less than 10%
performance degradation given in the preceeding article.  It seems to me
that the configuration I used would be very close to the worst case for
almost any architecture.  An n-way set associative cache might decrease
the degradation, but given the nature of the Dhrystone benchmark I doubt
that this decrease would be noticable.

The minimum penalty for missing the MC68020 internal cache is one cycle
(60 ns at 16.67 MHz).  Decreasing the clock rate causes the minimum
penalty to increase to 80 ns, thus increasing the performance degradation
due to the disabling of the internal cache.  Likewise, increasing the number
of external program cache wait-states causes the performance degradation
to increase.

With many other architectures the disabling of the internal cache would cause
a much larger performance degradation.  The Sun 3, for example, uses a
"syncopated" clock to achieve an average memory access time of 90 ns (1.5
wait-states).  Thus the performance degradation increases dramatically without
the MC68020 cache.  [Note: Sun people...please correct me if I'm wrong -- EDN
isn't the best place to get technical information].  This will be the case in
any system in which the second level of the memory hierarchy with respect to
program accesses (after the first level, the internal cache) can't be accessed
in the very small, no wait-state timing window of external bus fetches.
-- 

Mark Campbell
Phone:  (803)-791-6697
E-Mail: {decvax!mcnc, ihnp4!msdc}!ncsu!ncrcae!sauron!campbell