Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site sauron.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!qantel!lll-crg!lll-lcc!vecpyr!amd!pesnta!pyramid!gould9!ncr-sd!ncrcae!sauron!campbell From: campbell@sauron.UUCP (Mark Campbell) Newsgroups: net.arch Subject: Re: 11/08/85 Dhrystone Benchmark Results Message-ID: <594@sauron.UUCP> Date: Mon, 25-Nov-85 13:38:58 EST Article-I.D.: sauron.594 Posted: Mon Nov 25 13:38:58 1985 Date-Received: Fri, 29-Nov-85 21:28:23 EST References: <1129@hou2h.UUCP> <643@cornell.UUCP> <340@ncr-sd.UUCP> Reply-To: campbell@sauron.UUCP (Mark Campbell) Distribution: net.arch Organization: NCR Corp., Advanced System Development, Columbia, SC Lines: 55 Keywords: dhrystone, cache In article <340@ncr-sd.UUCP> stubbs@ncr-sd.UUCP (0000-Jan Stubbs) writes: >In article <643@cornell.UUCP> jqj@cornell.UUCP (J Q Johnson) writes: >>How much does a typical cache architecture (say a 4K 2-way associative >>cache, or the onboard cache on a 68020) effect Dhrystone performance? >> >I have found the 256 byte instruction cache on the 68020 to have small impact >on dhrystone performance ( <10% ) and in any other larger benchmark program. >This is easy to measure as you can turn off the cache. Much more important are >off chip data and instruction caches if large enough. It is difficult to evaluate the effects of cache architecture with respect to the performance of a system without taking into account the implementation of that architecture and the implementation of the rest of the system. The MC68020 internal (on-chip) cache is an excellent example: the implementation details of the system in which the MC68020 resides is extremely important. I also executed the Dhrystone benchmark on an MC68020-based system with the internal cache disabled and enabled. I found the average performance degradation due to the disabling of the internal cache to be a minimum of 15%, with an average degradation of 18% (from 4545 to 3846). Some details of the system on which I obtained these numbers are given below: Machine: NCR Tower 32 Clock Rate: 16.67 MHz External Caches: 0 Wait-State, 6K Direct-Mapped Program Cache 1 Wait-State, 2K Direct-Mapped Data Cache Given these results, I have a problem understanding the less than 10% performance degradation given in the preceeding article. It seems to me that the configuration I used would be very close to the worst case for almost any architecture. An n-way set associative cache might decrease the degradation, but given the nature of the Dhrystone benchmark I doubt that this decrease would be noticable. The minimum penalty for missing the MC68020 internal cache is one cycle (60 ns at 16.67 MHz). Decreasing the clock rate causes the minimum penalty to increase to 80 ns, thus increasing the performance degradation due to the disabling of the internal cache. Likewise, increasing the number of external program cache wait-states causes the performance degradation to increase. With many other architectures the disabling of the internal cache would cause a much larger performance degradation. The Sun 3, for example, uses a "syncopated" clock to achieve an average memory access time of 90 ns (1.5 wait-states). Thus the performance degradation increases dramatically without the MC68020 cache. [Note: Sun people...please correct me if I'm wrong -- EDN isn't the best place to get technical information]. This will be the case in any system in which the second level of the memory hierarchy with respect to program accesses (after the first level, the internal cache) can't be accessed in the very small, no wait-state timing window of external bus fetches. -- Mark Campbell Phone: (803)-791-6697 E-Mail: {decvax!mcnc, ihnp4!msdc}!ncsu!ncrcae!sauron!campbell