Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!spool.mu.edu!snorkelwacker.mit.edu!ai-lab!life!tmb From: tmb@ai.mit.edu (Thomas M. Breuel) Newsgroups: comp.lang.c++ Subject: reference counting vs. scanning GC (was: Implementing LISP in C++) Message-ID: Date: 24 Apr 91 07:33:53 GMT Article-I.D.: volterra.TMB.91Apr24033353 Sender: news@ai.mit.edu Organization: MIT Artificial Intelligence Lab Lines: 61 In-reply-to: marti@mint.inf.ethz.ch's message of 23 Apr 91 08:45:55 GMT In article <1991Apr23.000632.9175@neon.Stanford.EDU> hoelzle@neon.Stanford.EDU (Urs Hoelzle) writes in response to an earlier message by tmb@ai.mit.edu (Thomas M. Breuel): >>[Any kind of garbage collection other than reference counting] will have >>much higher overhead [in the case of non-recursive data structures]. This is quoted incorrectly, and it does not represent what I was saying. >What is the reason why this must be so? Do you have any numbers to >show that reference counting is superior in this case? (Read: I don't >believe it.) Neither do I. In fact, I was surprised to find that using the conservative garbage collector due to Hans Boehm and Mark Weiser (and described in Software Practice and Experience in 1988?) instead of doing reference counting myself was _considerably_ faster in a C++ implementation which used many dynamically allocated arrays. If my memory serves me correctly, the running times dropped to about 60% of the original ones. (And, yes, the application ran long enough to incur several garbage collections!) Reference counting incurs costs at each copy and assignment operation involving references to data. For most applications that makes reference counting inferior to scanning garbage collectors. However, while the incremental cost of collecting another object with scanning garbage collectors is considerably smaller than the bookkeeping cost required for reference counting that object, the startup cost for a scanning garbage collector is much higher compared to the zero startup cost for a reference counting system. This is the case because you have to scan through a considerable number of objects to determine what is alive and what is dead before you can start any storage reclamation (generational schemes reduce, but do not eliminate, this overhead). Now, if you have some resource of which you can allocate only a small number of items (say 10-100) before you need to garbage collect, the overhead associated with scanning garbage collectors can be much more costly than the overhead associated with updating reference counts. Typical examples of such situations are numerical applications written in a functional style and data parallel languages like *Lisp (other examples are file descriptors, network ports, communication ports, and other kinds of very limited hardware resources). In a serial numerical program written in a functional style, you will easily generate garbage at rates of 10M/sec, meaning that you have to garbage collect very frequently. The rates are much higher for data level parallel languages. Such languages have the additional useful feature that those objects for which reference counting makes sense can also never be circular, which means that reference counting not only is the most efficient garbage collection technique, but that it also is completely sufficient. Reference counting will most likely also give you better locality of reference as well, since recently freed objects are likely to have been accessed recently as well. For C++, the implication of this is that, while it may make sense to use scanning garbage collectors for most of the garbage, reference counting is a sensible technique for large floating point and integer vectors, which is all I was claiming to begin with.