Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!elroy.jpl.nasa.gov!decwrl!parc!boehm
From: boehm@parc.xerox.com (Hans Boehm)
Newsgroups: comp.lang.c++
Subject: Re: reference counting vs. scanning GC
Keywords: locality, large address spaces, reference counting, garbage collection
Message-ID: <boehm.673720634@siria>
Date: 8 May 91 16:37:14 GMT
References: <TMB.91Apr24033353@volterra.ai.mit.edu| <1991Apr24.192424.23745@kestrel.edu> <72149@microsoft.UUCP>
Sender: news@parc.xerox.com
Organization: Xerox PARC
Lines: 63

jimad@microsoft.UUCP (Jim ADCOCK) writes:

>In article <1991Apr24.192424.23745@kestrel.edu| gyro@kestrel.edu (Scott Layson Burson) writes:
>|This is very definitely a concern I have about C++: that when people
>|start manipulating 60 Mb worth of objects in one program (and I have
>|already seen 30 Mb process sizes), that the locality problem is going
>|to bring their machines to their knees, even though they would have
>|had no trouble running the same algorithms on the same data in Lisp.
>|I don't know of any allocation algorithms for C++ that pay any
>|attention to locality, nor do I really see how there *can* be any,
>|because the information about what other objects an object X points to
>|or is pointed to by is not available at the time X is created.
>|Furthermore, there is no way of doing compaction of live objects.
>|
>|As I say, I am very concerned about this.  I think the C++ community
>|is not prepared for the magnitude of the problem.

>Agreed.  ...

>A more fundamental problem is that any fixed placement strategy assumes
>that the patterns of usage of objects is going to remain fixed, but this
>is not true of any non-trivial app.  Thus, the only way to maintain good
>locality of reference over the long run is to have moveable objects.
>As an extreme, see Jul's thesis on Emerald, where objects are moved 
>"automatically" over the net to the process where they are needed.

I agree that in a distributed memory system this makes sense.  In the
uniprocessor case, the issues are less clear.  There is some reason to
believe that objects allocated close together in time are likely to be
accessed together.  (I believe Paul Wilson's paper in the upcoming SIGPLAN
conference makes this point, but I'm sure there are better references.)
Fixed placement allocators typically allocate successively allocated
objects close to each other.  (If they don't, they should.)  Thus there
is some reason to believe that fixed placement allocators don't do that
badly with respect to locality.

Moving collectors uniformly win in that they avoid fragmentation.  But,
at least in our environment (Cedar with 30 MB or so heap and some long
lived sessions), this isn't much of an issue even with fixed allocation.

A compacting trace-and-sweep collector obviously wins over a noncompacting
one in terms of locality.  (Actually, in the presence of nonuniform
sized objects, even that's not completely clear, but it seems likely.)
But it tends to be expensive in other respects.  Copying collectors
have their own problems.  You have to be fairly clever to get any
reasonable locality.  (My intuition is that breadth-first copying is
significantly worse than fixed allocation.  I don't have numbers to
either back that up or to refute it.)  Copying on reference wins
substantially for at least one application (see Johnson's paper in
ASPLOS '91), but that's hard to do well on a conventional machine.  A
full copying collection is likely to be a disaster, given the current
ratio of cpu to disk speeds.  Moving collectors don't get along with
some programming techniques (e.g. hashing on addresses).

A comparison of some of these approaches can be found in John
DeTreville, "Experience with Concurrent Garbage Collectors for Modula-2+",
DEC SRC report no. 64.  Ben Zorn's paper in the 1990 Lisp conference also
compares copying and trace&sweep collection in one particular context.
If anybody knows of any other real data (that's not completely obsolete),
I would love to know about it.

Hans
(boehm@xerox.com)