Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!spool.mu.edu!snorkelwacker.mit.edu!ai-lab!life!tmb
From: tmb@ai.mit.edu (Thomas M. Breuel)
Newsgroups: comp.lang.c++
Subject: reference counting vs. scanning GC (was: Implementing LISP in C++)
Message-ID: <TMB.91Apr24033353@volterra.ai.mit.edu>
Date: 24 Apr 91 07:33:53 GMT
Article-I.D.: volterra.TMB.91Apr24033353
Sender: news@ai.mit.edu
Organization: MIT Artificial Intelligence Lab
Lines: 61
In-reply-to: marti@mint.inf.ethz.ch's message of 23 Apr 91 08:45:55 GMT


   In article <1991Apr23.000632.9175@neon.Stanford.EDU> hoelzle@neon.Stanford.EDU
   (Urs Hoelzle) writes in response to an earlier message by tmb@ai.mit.edu
   (Thomas M. Breuel):
   >>[Any kind of garbage collection other than reference counting] will have
   >>much higher overhead [in the case of non-recursive data structures].

This is quoted incorrectly, and it does not represent what I was saying.

   >What is the reason why this must be so?  Do you have any numbers to
   >show that reference counting is superior in this case?  (Read: I don't
   >believe it.)

   Neither do I.  In fact, I was surprised to find that using the conservative
   garbage collector due to Hans Boehm and Mark Weiser (and described in
   Software Practice and Experience in 1988?) instead of doing reference
   counting myself was _considerably_ faster in a C++ implementation which
   used many dynamically allocated arrays.  If my memory serves me correctly,
   the running times dropped to about 60% of the original ones.  (And, yes,
   the application ran long enough to incur several garbage collections!)

Reference counting incurs costs at each copy and assignment operation
involving references to data. For most applications that makes reference
counting inferior to scanning garbage collectors.

However, while the incremental cost of collecting another object with
scanning garbage collectors is considerably smaller than the
bookkeeping cost required for reference counting that object, the
startup cost for a scanning garbage collector is much higher compared
to the zero startup cost for a reference counting system. This is the
case because you have to scan through a considerable number of objects
to determine what is alive and what is dead before you can start any
storage reclamation (generational schemes reduce, but do not
eliminate, this overhead).

Now, if you have some resource of which you can allocate only a small
number of items (say 10-100) before you need to garbage collect, the
overhead associated with scanning garbage collectors can be much more
costly than the overhead associated with updating reference counts.

Typical examples of such situations are numerical applications written
in a functional style and data parallel languages like *Lisp (other
examples are file descriptors, network ports, communication ports, and
other kinds of very limited hardware resources). In a serial numerical
program written in a functional style, you will easily generate
garbage at rates of 10M/sec, meaning that you have to garbage collect
very frequently. The rates are much higher for data level parallel
languages.

Such languages have the additional useful feature that those objects
for which reference counting makes sense can also never be circular,
which means that reference counting not only is the most efficient
garbage collection technique, but that it also is completely
sufficient. Reference counting will most likely also give you better
locality of reference as well, since recently freed objects are likely
to have been accessed recently as well.

For C++, the implication of this is that, while it may make sense
to use scanning garbage collectors for most of the garbage, reference
counting is a sensible technique for large floating point and integer
vectors, which is all I was claiming to begin with.