Path: utzoo!utgpu!watmath!watdragon!watsol!tbray
From: tbray@watsol.waterloo.edu (Tim Bray)
Newsgroups: comp.arch
Subject: How Caches Work
Message-ID: <16306@watdragon.waterloo.edu>
Date: 10 Sep 89 20:32:15 GMT
References: <21936@cup.portal.com> <1082@cernvax.UUCP>
Sender: daemon@watdragon.waterloo.edu
Reply-To: tbray@watsol.waterloo.edu (Tim Bray)
Organization: U. of Waterloo, Ontario
Lines: 33

In article <1082@cernvax.UUCP> hjm@cernvax.UUCP (Hubert Matthews) writes:
+You may be running software that has a very low cache hit rate if you
+are doing CAD work or scientific calculations.  Take this little loop
+for example:
+
+      SUM = 0.0
+      DO 10 I = 1, 1000000
+	SUM = SUM + VEC(I)
+   10 CONTINUE
+
+A data cache is *no use at all* for this problem.  You will get a
+cache miss on every data access.  

Now hold on just a dag-blaggin' minute.  I'm a software weenie who's never
built a cache, but I thought I understood how they work.  If this is right
obviously I don't at all.  Somebody who knows should either debunk this or
explain what's really going on, because I'm probably not alone in my
ignorance.

I thought caches respected the principle of locality.  And this code has
really good locality.  In fact, I thought they were block- or page-based.  And 
when VEC(I) hits a page for the first time, it'll be cached, and then it'll 
keep hitting the cache (bar nasty context switches, etc.), until VEC(I) moves
off that page.  One cache miss per page; in the worst case, if SUM is DOUBLE 
and the page size is 512, you do 64 times as well as hitting main memory per 
loop iteration.  Nyet?

+Similarly, copying data from one bit
+of memory to another will be limited by the raw memory speed.  

Say what?

Tim Bray, New OED Project, U of Waterloo, Ontario