Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!uwvax!rang From: rang@cs.wisc.edu (Anton Rang) Newsgroups: comp.arch Subject: Re: How Caches Work Message-ID: Date: 10 Sep 89 23:49:00 GMT References: <21936@cup.portal.com> <1082@cernvax.UUCP> <16306@watdragon.waterloo.edu> Sender: news@spool.cs.wisc.edu Organization: UW-Madison CS department Lines: 47 In-reply-to: tbray@watsol.waterloo.edu's message of 10 Sep 89 20:32:15 GMT In article <16306@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes: >In article <1082@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes: >+You may be running software that has a very low cache hit rate if you >+are doing CAD work or scientific calculations. Take this little loop >+for example: >+ >+ SUM = 0.0 >+ DO 10 I = 1, 1000000 >+ SUM = SUM + VEC(I) >+ 10 CONTINUE >+ >+A data cache is *no use at all* for this problem. You will get a >+cache miss on every data access. > >I thought caches respected the principle of locality. And this code has >really good locality. The instruction fetches here have good locality; the data fetches don't (1 MW of data is being accessed). The problem here is that 1 million words (assuming 1 word per vector element) of memory is being accessed. No matter HOW your cache is arranged, you are going to have to go to main memory for every one of those words (conceivably barring a few in the cache when you start). In fact, a (data) cache can hurt in this kind of code, because you have 1 million data fetch times, plus the overhead of the cache (small, but significant for very large problems). > And when VEC(I) hits a page for the first time, it'll be cached, [ ... ] ^^^^^^^^^^^^^^^ The problem is that caching something is not without cost. Loading 128 words into the cache from main memory is generally just as expensive as fetching 128 words from main memory directly. The cache only helps if you're doing multiple fetches of the same memory location before it's overwritten with other data. >+Similarly, copying data from one bit >+of memory to another will be limited by the raw memory speed. > >Say what? A := B, where A and B are 1 MW arrays, will require at least 2 million memory cycle times (1 million reads, 1 million writes). +----------------------------------+------------------+ | Anton Rang (grad student) | "VMS Forever!" | | University of Wisconsin--Madison | rang@cs.wisc.edu | +----------------------------------+------------------+