Path: utzoo!attcan!uunet!mcsun!ukc!strath-cs!cs.glasgow.ac.uk!jack From: jack@cs.glasgow.ac.uk (Jack Campin) Newsgroups: comp.object Subject: Re: Garbage collection -- difficulty, cost, generality Message-ID: <4870@vanuata.cs.glasgow.ac.uk> Date: 20 Mar 90 16:51:53 GMT References: <1990Mar13.011146.6019@Neon.Stanford.EDU> <2356@syma.sussex.ac.uk> <4836@vanuata.cs.glasgow.ac.uk> <1990Mar15.232035.147@Neon.Stanford.EDU> <4862@vanuata.cs.glasgow.ac.uk> <1990Mar19.181029.14823@Neon.Stanford.EDU> Reply-To: jack@cs.glasgow.ac.uk (Jack Campin) Organization: COMANDOS Project, Glesga Yoonie, Unthank Lines: 83 Summary: Expires: Sender: Followup-To: Keywords: wilson@carcoar.Stanford.EDU (Paul Wilson) wrote: >> anyone got a nice closure-bashing benchmark that we could try >> in Smalltalk, Scheme, Poplog and PS-algol to firm this up a bit? > There's a closure-intensive benchmark or two in the appendix to David Kranz' > thesis on Orbit (the T Compiler) from Yale. I don't know whether that's > the right thing to test, though. It's hard to come up with a fair closure > benchmark that will work across languages. In some languages you might > use closures in certain situations, but in others you might not, even > if they're available. For example, in vanilla Scheme (or PS-Algol) > you might use closures to implement objects (which can get you in > trouble performance-wise -- there's another subtle interaction with > generational gc's). But in T, a full object system is provided, so > you'd use that instead. I don't see why closures and explicitly created objects should affect gc techniques differently. The way PS-algol does closures certainly can, but that's a detail of implementation. (Too much is retained, since the abstract machine objects that hold the environment of a partially applied procedure are way too big and flat). As the storage system sees it, closures and objects are the same thing. >>> what kind of programs were used for testing? >> Recompiling the compiler. > Depending on your compiler, this may or may not be a reasonable test. For > our bytecode compiler, our generational gc is extremely effective. If hte > first generation is of reasonable size, nothing except the actual generated > code (and associated data structures like literal frames) survives to be > advanced out of the first generation. When the compiler starts compiling > modules instead of just independent procedures, we expect the copying > costs to go up significantly. (At a gc, we'll typically have some > intermediate data structures of a whole module to copy, as well as those > involved in compiling a single procedure.) Further in this direction > is Lucid's Common Lisp compiler, which may have a *megabyte* of inter- > procedural analysis goop lying around when doing an optimized system build. > For that kind of "serious" stuff you'd want a serious generational gc. The test I was quoting would have taken 12MB of heap to run without gc; the heap was constrained to be 1MB. The compiler takes a fairly simple- minded approach and passes the entire parse tree between the parser and code generator in one lump. How does this relate to your setup? > I seem to recall that PS-Algol used to have a rather strange allocation/gc > scheme, with objects allocated in areas by type for locality reasons. Were > any locality effects ever measured? (Also, I seem to recall someone saying > that there was a possibility of storage leaks in the P-store, under certain > circumstances, or something like that. Is the same scheme in use, or has > it been modified?) This is correct. The persistent store is partitioned into containers (mis)named "databases". The idea of this model is that programmers would control inter-database references so that the databases formed a directed acyclic graph; the leaf databases could then be garbage-collected individually without disturbing others. I implemented a persistent store garbage collector that took advantage of this; it permitted storage leaks, but these were completely avoidable by any sane PS-algol programmer. The locality effects are thus very dependent on what people do with the system. The problem was getting our users to realize the persistent store implications of their programs' name structures. Let enough naive student hackers loose with no documentation, and inter-database references will rapidly degenerate into a spaghetti mess that can only be globally gc'd. We have another gc that does mark/sweep on the whole persistent universe - obviously, this is slower - and didn't have much option but to use it instead (this one clears up storage leaks). Forethought and a tougher attitude to system administration would have prevented this debacle - but since nobody else at the time had a persistent store capable of accumulating a ragbag of inter-referential student projects for several years, we couldn't predict what would happen. A final question: how do object sizes affect GC techniques? Most of the fans of generational methods seem to be LISPers, whose objects are mostly the same size. PS-algol objects are more or less Poisson-distributed (mean 50-60 bytes) and I suspect that Smalltalk and Poplog ones are too. -- -- Jack Campin Computing Science Department, Glasgow University, 17 Lilybank Gardens, Glasgow G12 8QQ, Scotland 041 339 8855 x6044 work 041 556 1878 home JANET: jack@cs.glasgow.ac.uk BANG!net: via mcvax and ukc FAX: 041 330 4913 INTERNET: via nsfnet-relay.ac.uk BITNET: via UKACRL UUCP: jack@glasgow.uucp