Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!elroy.jpl.nasa.gov!decwrl!parc!boehm From: boehm@parc.xerox.com (Hans Boehm) Newsgroups: comp.lang.c++ Subject: Re: reference counting vs. scanning GC Keywords: locality, large address spaces, reference counting, garbage collection Message-ID: Date: 8 May 91 16:37:14 GMT References: <72149@microsoft.UUCP> Sender: news@parc.xerox.com Organization: Xerox PARC Lines: 63 jimad@microsoft.UUCP (Jim ADCOCK) writes: >In article <1991Apr24.192424.23745@kestrel.edu| gyro@kestrel.edu (Scott Layson Burson) writes: >|This is very definitely a concern I have about C++: that when people >|start manipulating 60 Mb worth of objects in one program (and I have >|already seen 30 Mb process sizes), that the locality problem is going >|to bring their machines to their knees, even though they would have >|had no trouble running the same algorithms on the same data in Lisp. >|I don't know of any allocation algorithms for C++ that pay any >|attention to locality, nor do I really see how there *can* be any, >|because the information about what other objects an object X points to >|or is pointed to by is not available at the time X is created. >|Furthermore, there is no way of doing compaction of live objects. >| >|As I say, I am very concerned about this. I think the C++ community >|is not prepared for the magnitude of the problem. >Agreed. ... >A more fundamental problem is that any fixed placement strategy assumes >that the patterns of usage of objects is going to remain fixed, but this >is not true of any non-trivial app. Thus, the only way to maintain good >locality of reference over the long run is to have moveable objects. >As an extreme, see Jul's thesis on Emerald, where objects are moved >"automatically" over the net to the process where they are needed. I agree that in a distributed memory system this makes sense. In the uniprocessor case, the issues are less clear. There is some reason to believe that objects allocated close together in time are likely to be accessed together. (I believe Paul Wilson's paper in the upcoming SIGPLAN conference makes this point, but I'm sure there are better references.) Fixed placement allocators typically allocate successively allocated objects close to each other. (If they don't, they should.) Thus there is some reason to believe that fixed placement allocators don't do that badly with respect to locality. Moving collectors uniformly win in that they avoid fragmentation. But, at least in our environment (Cedar with 30 MB or so heap and some long lived sessions), this isn't much of an issue even with fixed allocation. A compacting trace-and-sweep collector obviously wins over a noncompacting one in terms of locality. (Actually, in the presence of nonuniform sized objects, even that's not completely clear, but it seems likely.) But it tends to be expensive in other respects. Copying collectors have their own problems. You have to be fairly clever to get any reasonable locality. (My intuition is that breadth-first copying is significantly worse than fixed allocation. I don't have numbers to either back that up or to refute it.) Copying on reference wins substantially for at least one application (see Johnson's paper in ASPLOS '91), but that's hard to do well on a conventional machine. A full copying collection is likely to be a disaster, given the current ratio of cpu to disk speeds. Moving collectors don't get along with some programming techniques (e.g. hashing on addresses). A comparison of some of these approaches can be found in John DeTreville, "Experience with Concurrent Garbage Collectors for Modula-2+", DEC SRC report no. 64. Ben Zorn's paper in the 1990 Lisp conference also compares copying and trace&sweep collection in one particular context. If anybody knows of any other real data (that's not completely obsolete), I would love to know about it. Hans (boehm@xerox.com)