Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uwm.edu!bionet!agate!shelby!neon!carcoar!wilson
From: wilson@carcoar.Stanford.EDU (Paul Wilson)
Newsgroups: comp.arch
Subject: Re: Cache Size
Summary: garbage collected languages can use big caches
Keywords: garbage collection, locality of reference, cache size
Message-ID: <1990Feb26.022057.28461@Neon.Stanford.EDU>
Date: 26 Feb 90 02:20:57 GMT
References: <7393@pdn.paradyne.com> <76700146@p.cs.uiuc.edu> <1990Feb22.175317.12898@utzoo.uucp> <WAYNE.90Feb23091735@dsndata.uucp> <8168@pt.cs.cmu.edu>
Sender: news@Neon.Stanford.EDU (USENET News System)
Reply-To: wilson@carcoar.Stanford.EDU (Paul Wilson)
Organization: U. of Illinois at Chicago (UIC, *not* UofC or UIUC)
Lines: 60

In article <8168@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>In article <WAYNE.90Feb23091735@dsndata.uucp> wayne@dsndata.uucp 
>	(Wayne Schlitt) writes:
>>i am sure there _are_ real programs that can fit entirely in the 68030's
>>massive (:-) 256 byte data and instruction caches.  i am sure there
>>are a lot more real programs that can fit entirely in a 8k cache.  i
>>am sure that a lot of real programs will fit nicely in 32k-128k
>>caches.
>
>Programs certainly do vary. The transaction people at Concurrent say
>that their idea of a cache is a megabyte. There may also be programs
>with essentially no locality: this has been claimed for some network
>communications software.  (The point was that the packets would be
>processed exactly once, and then passed along, never to be seen
>again.)

Garbage collected languages have similar characteristics.  If you
use a generational garbage collector, you may be able to fit the
youngest generation entirely in the cache.  But for it to be 
worthwhile, you want a fairly big cache.

The problem is that you allocated up through a region of memory,
allocating mostly very-short-lived objects that you touch a
few times and then never reference again.  A while later, you
copy the minority of live objects to some other region, and
allocate upward through the same area again.  If you can
keep this whole area in cache, you don't repeatedly miss on it.

This *may* require a *big* cache.  Ben Zorn's thesis from Berkeley,
(Dec '89) shows that misses only drop off slowly for a large
direct-mapped cache like SPUR's, so that you might need
1-2MB caches to get really good hit rates.  I have a theory that
you might do much better with set-associative caches, if they're
around 200KB or more.  But the numbers aren't in yet.

Note that if your cache is not at least a bit bigger than your
youngest generation, you don't win -- you tend to evict blocks
before you reuse them -- LRU is your enemy here.  So the miss
rate is fairly insensitive to cache size until the cache
gets to be as large as the youngest generation.

A smarter replacement policy, which knew about this, could
probably cut misses significantly for caches too small to
hold the whole allocation area.  (But is it worth it?  Maybe.
It might do a good job for the above-mentioned networking
software as well, and some other things.)

>-- 
>Don		D.C.Lindsay 	Carnegie Mellon Computer Science

   -- Paul

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680