Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!mentor.cc.purdue.edu!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Re: How Caches Work Summary: To cache or not to cache, that is the compiler's decision. Keywords: Cache Bypass Message-ID: <12855@pur-ee.UUCP> Date: 13 Sep 89 00:19:34 GMT References: <21936@cup.portal.com> <1082@cernvax.UUCP> <16306@watdragon.waterloo.edu> <8399@boring.cwi.nl> <3989@phri.UUCP> Reply-To: hankd@pur-ee.UUCP (Hank Dietz) Organization: Purdue University Engineering Computer Network Lines: 34 In article <3989@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: > Here's a (possibly crazy) idea for cache design. The current EUD ... > What if you segmented the virtual memory space (Oh no! Not >segmented address spaces again! Shades of Intel!) so that the top bit was >a hint to the cache on probably access patterns. Variables which were >expected to hit the cache a lot (SUM and I in the EUD) would be put in the >"normal" part of the address space. Variables which were expected to be >sequential access and thus never hit (VEC in the EUD) would be put in the >other half of the address space. The cache would know to not bother doing >a tag match on this kind of access. The advantages would be faster access >time (a memory fetch should be faster than a cache miss followed by a memory >fetch) but more important it wouldn't cause bogus cache flushes. Look at Chi's PhD thesis (see my last posting). The idea of using a bit to control mapping is "older than dirt" and the idea of using it in sequential machines to control cacheability is part of what Chi's thesis suggests. The catch is that you don't divide the address space -- one simply has the compiler tag individual REFERENCES with the "don't cache me" bit. The reason you just ignore the high bit for addressing rather than use it with creative data layout is that the variable X might be cached along one program flow path, but not cached along another flow path. Chi also gives an O(n) typical performance compiler algorithm for setting the "don't cache me" bits... your "sequential access" rule is a VERY crude approximation. You get a BIG performance gain this way... results also in Chi's thesis. -hankd@ecn.purdue.edu PS: Chi and I have come to call this mechanism "Cache Bypass." Parts of this work have appeared in a bunch of papers as well as in his PhD thesis... I think the first was at HICSS 87; the most recent was at the SIGPLAN 89 conf. on programming lang. design & implementation.