Path: utzoo!attcan!uunet!mcsun!ukc!strath-cs!cs.glasgow.ac.uk!jack
From: jack@cs.glasgow.ac.uk (Jack Campin)
Newsgroups: comp.object
Subject: Re: Garbage collection -- difficulty, cost, generality
Message-ID: <4870@vanuata.cs.glasgow.ac.uk>
Date: 20 Mar 90 16:51:53 GMT
References: <1990Mar13.011146.6019@Neon.Stanford.EDU> <2356@syma.sussex.ac.uk> <4836@vanuata.cs.glasgow.ac.uk> <1990Mar15.232035.147@Neon.Stanford.EDU> <4862@vanuata.cs.glasgow.ac.uk> <1990Mar19.181029.14823@Neon.Stanford.EDU>
Reply-To: jack@cs.glasgow.ac.uk (Jack Campin)
Organization: COMANDOS Project, Glesga Yoonie, Unthank
Lines: 83
Summary:
Expires:
Sender:
Followup-To:
Keywords:


wilson@carcoar.Stanford.EDU (Paul Wilson) wrote:

>> anyone got a nice closure-bashing benchmark that we could try
>> in Smalltalk, Scheme, Poplog and PS-algol to firm this up a bit?

> There's a closure-intensive benchmark or two in the appendix to David Kranz'
> thesis on Orbit (the T Compiler) from Yale.  I don't know whether that's
> the right thing to test, though.  It's hard to come up with a fair closure
> benchmark that will work across languages.  In some languages you might
> use closures in certain situations, but in others you might not, even
> if they're available.  For example, in vanilla Scheme (or PS-Algol)
> you might use closures to implement objects (which can get you in 
> trouble performance-wise -- there's another subtle interaction with 
> generational gc's).  But in T, a full object system is provided, so 
> you'd use that instead.

I don't see why closures and explicitly created objects should affect gc
techniques differently.  The way PS-algol does closures certainly can,
but that's a detail of implementation.  (Too much is retained, since the
abstract machine objects that hold the environment of a partially applied
procedure are way too big and flat).  As the storage system sees it,
closures and objects are the same thing.


>>> what kind of programs were used for testing?
>> Recompiling the compiler.
> Depending on your compiler, this may or may not be a reasonable test.  For
> our bytecode compiler, our generational gc is extremely effective.  If hte
> first generation is of reasonable size, nothing except the actual generated 
> code (and associated data structures like literal frames) survives to be
> advanced out of the first generation.  When the compiler starts compiling
> modules instead of just independent procedures, we expect the copying 
> costs to go up significantly.   (At a gc, we'll typically have some
> intermediate data structures of a whole module to copy, as well as those
> involved in compiling a single procedure.)  Further in this direction
> is Lucid's Common Lisp compiler, which may have a *megabyte* of inter-
> procedural analysis goop lying around when doing an optimized system build.
> For that kind of "serious" stuff you'd want a serious generational gc.

The test I was quoting would have taken 12MB of heap to run without gc;
the heap was constrained to be 1MB. The compiler takes a fairly simple-
minded approach and passes the entire parse tree between the parser and
code generator in one lump.  How does this relate to your setup?


> I seem to recall that PS-Algol used to have a rather strange allocation/gc 
> scheme, with objects allocated in areas by type for locality reasons.  Were
> any locality effects ever measured?  (Also, I seem to recall someone saying
> that there was a possibility of storage leaks in the P-store, under certain
> circumstances, or something like that.  Is the same scheme in use, or has 
> it been modified?)

This is correct.  The persistent store is partitioned into containers
(mis)named "databases".  The idea of this model is that programmers would
control inter-database references so that the databases formed a directed
acyclic graph; the leaf databases could then be garbage-collected
individually without disturbing others.  I implemented a persistent store
garbage collector that took advantage of this; it permitted storage leaks,
but these were completely avoidable by any sane PS-algol programmer.

The locality effects are thus very dependent on what people do with the
system.  The problem was getting our users to realize the persistent store
implications of their programs' name structures.  Let enough naive student
hackers loose with no documentation, and inter-database references will
rapidly degenerate into a spaghetti mess that can only be globally gc'd.
We have another gc that does mark/sweep on the whole persistent universe -
obviously, this is slower - and didn't have much option but to use it
instead (this one clears up storage leaks).  Forethought and a tougher
attitude to system administration would have prevented this debacle - but
since nobody else at the time had a persistent store capable of
accumulating a ragbag of inter-referential student projects for several
years, we couldn't predict what would happen.

A final question: how do object sizes affect GC techniques?  Most of the
fans of generational methods seem to be LISPers, whose objects are mostly
the same size.  PS-algol objects are more or less Poisson-distributed (mean
50-60 bytes) and I suspect that Smalltalk and Poplog ones are too.
-- 
--  Jack Campin   Computing Science Department, Glasgow University, 17 Lilybank
Gardens, Glasgow G12 8QQ, Scotland   041 339 8855 x6044 work  041 556 1878 home
JANET: jack@cs.glasgow.ac.uk    BANG!net: via mcvax and ukc   FAX: 041 330 4913
INTERNET: via nsfnet-relay.ac.uk   BITNET: via UKACRL   UUCP: jack@glasgow.uucp