Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!mentor.cc.purdue.edu!pur-ee!hankd
From: hankd@pur-ee.UUCP (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: How Caches Work
Summary: To cache or not to cache, that is the compiler's decision.
Keywords: Cache Bypass
Message-ID: <12855@pur-ee.UUCP>
Date: 13 Sep 89 00:19:34 GMT
References: <21936@cup.portal.com> <1082@cernvax.UUCP> <16306@watdragon.waterloo.edu> <RANG.89Sep10184900@derby.cs.wisc.edu> <8399@boring.cwi.nl> <3989@phri.UUCP>
Reply-To: hankd@pur-ee.UUCP (Hank Dietz)
Organization: Purdue University Engineering Computer Network
Lines: 34

In article <3989@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>	Here's a (possibly crazy) idea for cache design.  The current EUD
...
>	What if you segmented the virtual memory space (Oh no!  Not
>segmented address spaces again!  Shades of Intel!) so that the top bit was
>a hint to the cache on probably access patterns.  Variables which were
>expected to hit the cache a lot (SUM and I in the EUD) would be put in the
>"normal" part of the address space.  Variables which were expected to be
>sequential access and thus never hit (VEC in the EUD) would be put in the
>other half of the address space.  The cache would know to not bother doing
>a tag match on this kind of access.  The advantages would be faster access
>time (a memory fetch should be faster than a cache miss followed by a memory
>fetch) but more important it wouldn't cause bogus cache flushes.

Look at Chi's PhD thesis (see my last posting).  The idea of using a bit to
control mapping is "older than dirt" and the idea of using it in sequential
machines to control cacheability is part of what Chi's thesis suggests.

The catch is that you don't divide the address space -- one simply has the
compiler tag individual REFERENCES with the "don't cache me" bit.  The reason
you just ignore the high bit for addressing rather than use it with creative
data layout is that the variable X might be cached along one program flow
path, but not cached along another flow path.  Chi also gives an O(n) typical
performance compiler algorithm for setting the "don't cache me" bits...  your
"sequential access" rule is a VERY crude approximation.

You get a BIG performance gain this way...  results also in Chi's thesis.

							-hankd@ecn.purdue.edu

PS: Chi and I have come to call this mechanism "Cache Bypass."  Parts of
    this work have appeared in a bunch of papers as well as in his PhD
    thesis...  I think the first was at HICSS 87; the most recent was at the
    SIGPLAN 89 conf. on programming lang. design & implementation.