Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!columbia!rutgers!ames!ptsfa!hoptoad!academ!soma!rice!titan!rbbb
From: rbbb@titan.rice.edu@rice.EDU (David Chase)
Newsgroups: comp.lang.lisp
Subject: Re: Garbage collection question
Message-ID: <449@dione.rice.EDU>
Date: Thu, 28-May-87 14:02:51 EDT
Article-I.D.: dione.449
Posted: Thu May 28 14:02:51 1987
Date-Received: Sun, 31-May-87 19:41:17 EDT
References: <4631@columbia.UUCP>
Sender: root@rice.EDU
Reply-To: rbbb@titan.rice.edu (David Chase)
Distribution: world
Organization: Rice University, Houston
Lines: 105

In article <4631@columbia.UUCP> yoram@cs.columbia.edu (Yoram Eisenstadter) writes:
>In June 1983, a paper by Lieberman and Hewitt ....
>I have the following questions:
>
>- Does anybody know of any empirical results on the performance
>of this particular algorithm?
>

The garbage collector the Symbolics 3600 is (was) similar.  Try reading
"Garbage Collection in a Large Lisp System", by Moon, 1984 SIGPLAN Symposium
on LISP and Functional Programming, and "Architecture of the Symbolics 3600",
by Moon, 1985 International Symposium on Computer Architecture.

>- Does anybody know if anybody ever collected data on the
>following properties of Lisp programs?  (What about
>object-oriented programs?)
>
>        (1) Rate of object creation
>        (2) Average lifetime of objects
>        (3) Proportion of "forward" pointers, which point from
>            one object to a more recently allocated object
>        (4) Locality of reference, i.e., do programs point
>            to "nearby" or "far away" objects
>
Yes.  Yes.
TITLE = "Compact Encodings of List Structure",
AUTHOR = "Daniel G. Bobrow and Douglas W. Clark",
JOURNAL = toplas, VOLUME = 1, NUMBER = 2, MONTH = oct, YEAR = 1979,

AUTHOR = "Douglas W. Clark",
TITLE = "Measurements of Dynamic List Structure Use in {L}isp",
JOURNAL = ieeese, VOLUME = 5, NUMBER = 1, MONTH = jan, YEAR = 1979,

AUTHOR = "D. W. Clark and C. C. Green",
TITLE = "An Empirical Study of List Structure in {LISP}",
JOURNAL = cacm, VOLUME = 20, NUMBER = 2, YEAR = 1977, MONTH = feb,

>- What are the various tradeoffs among garbage collection
>algorithms?  I see lots of systems with "stop-and-copy" garbage
>collectors, even though incremental garbage collectors are much
>more user-friendly.  What is it about incremental collectors that
>prevents everybody from implementing them?  Complexity?
>Inefficiency?

1) Tradeoffs, in no particular order:
   collection delay (essentially zero in a truly incremental system)
   collection "software" overhead
   collection "hardware" overhead
   need for help from the programmer
   language restrictions

Software overhead is the time spent (or not saved) in maintaining
invariants on the run-time environment for the sake of the collector.
This includes both tag-bit manipulations and optimizations left undone.

Hardware overhead is the extra memory needed to store tag bits, the
microcode (I'll call it hardware, anyway) assist, and the "barrier" on the
memory system.

Need for help from the programmer.  Some systems work much better if the
programmer helps out, either by providing clues or by explicitly reusing
storage.

Language restrictions include (for example) the ability to perform static
type-checking and forbidding the use of "upward FUNARGS" (using a function with
lexically bound variables as a return value).  Static type-checking can
permit the use of specially-compiled garbage collectors and allocation from
"pointer-free" storage (need not be traced).  Disallowing upward FUNARGs
guarantees that activation records can be stack allocated.  Compiler analysis
can also discover this in many situations.

People are less likely to implement "real-time" collectors (as described by
Baker in CACM 21(4)) because of the overhead placed on CAR and CDR
operations.  This overhead can be shifted to hardware (as in the Symbolics
machines), but then you must pay for the extra hardware.  The problem with
special-purpose hardware is that it is, well, special purpose.  There is less
demand for it, and there are fewer things that you can do with it.  Because
there is less demand for it, chip manufacturers don't work hard to make it go
fast.  Look at how the 68000 series has improved over the last 7 years, for
example.  With a Sun 3 and a smart compiler, you can compete with a 360o.  If
you decide that you hate LISP, then there is still a market for used Suns.

There was also a concurrent collector implemented for Cedar Mesa at Xerox.
It ran every so often, mostly concurrent with ordinary execution.  See "On
Adding Garbage Collection and Runtime Types to a Strongly-Typed, Statically
Checked, Concurrent Language", Xerox PARC TR CSL-84-7.  It helps to first
read "An efficient, incremental, automatic garbage collector" by Deutsch and
Bobrow, CACM 19(9).

Also, there has been a shift toward RISC machines in the last few years, and
so far those (except for work at UCB) tend not to help with garbage
collection.  The work at Berkeley has been in support of a not-really-
incremental collector that apparently works very very well.  See David
Ungar's article in Proc. of the ACM SIGSOFT/SIGPLAN Soft. Eng. Symp. on
Practical Software Development Environments.  His collector runs every few
seconds, but the collection times are quite fast (not "real-time", but
suitable for interactive use.  I've even met "satisfied customers").

I could say more, but I had better stop now.  There's a LOT to be said about
garbage collection.

Oh yeah, object-oriented statistics.  See "Smalltalk-80: Bits of History,
Words of Advice", ed. Glenn Krasner, Addison Wesley, 1983.

David Chase