Path: utzoo!attcan!uunet!cs.utexas.edu!sdd.hp.com!wuarchive!uwm.edu!bionet!arisia!roo!boehm From: boehm@parc.xerox.com (Hans Boehm) Newsgroups: comp.object Subject: Re: Precise GC (was Re: Do we really need types in OOPL's?) Message-ID: <574@roo.UUCP> Date: 18 Oct 90 18:30:28 GMT References: <736@tetrauk.UUCP> <18261.27131385@qut.edu.au> <563@roo.UUCP> <1990Oct16.015230.24169@cbnewsc.att.com> Sender: news@parc.xerox.com Lines: 91 In article <1990Oct16.015230.24169@cbnewsc.att.com> lgm@cbnewsc.att.com (lawrence.g.mayka) writes: >In article <563@roo.UUCP>, boehm@parc.xerox.com (Hans Boehm) writes: >> "Precise" garbage collection is a rather imprecise concept and thus, >> I would argue, a funny absolute criterion to use in language selection. >> I know of many gradations of "conservativism" in garbage collection. >I was referring to "exact" garbage collection, which collects *all* >inaccessible storage. My application must run continuously over long >periods of time without leaking memory. *Any* degree of conservatism >is too conservative for me (in the matter of GC, anyway). The problem again is that "accessibility" of storage is not defined by any language specification that I know of (with SELF apparently coming closer than most). In practice, it tends to be defined both by the collector and the compiler. Imposing restrictions on the collector isn't enough. >> 4) Pointer location information maintained on stack and registers (i.e. tagged >> pointers or a compiler generated map of where pointers are at various >> programs points). I believe Standard ML of NJ, and the original KCL >> fit here. >Bingo! Again, this isn't enough to guarantee anything interesting. Here's another, somewhat more involved, example. Assume that I have a program P that manipulates a queue Q, which is implemented as a singly linked list with head and tail pointers H and T. Consider the following pseudo-program: (Scopes are indicated by indentation level.) let H = a pointer variable; T = a pointer variable; F = a function variable of appropriate type; in let first = initial Queue entry (an allocated object); in let f = function not mentioning first; g = function mentioning first; in H := first; T := first; F := f; g(); P; Assume that P runs for a long time, and adds and removes many items from Q. Assume further that the length of the queue is bounded. Assume also that it does no other allocation, and that it calls F occasionally. Does this program consume more and more storage? It does if a reference to first is retained. This can happen because one of the following holds: 1) The collector is conservative, and there is a bogus reference to first. 2) first was allocated in as a tagged pointer on the stack, and the location was not cleared, and it was not overwritten by P's data. (This is likely on SPARC-like architectures that tend to leave lots of holes in the stack.) 3) The compiler decided to represent the function f as an pair. The environment for f contains the binding for first. 4) The compiler copies bindings into closures, but tries to share closures if possible. It created a single record containing the bindings used by both f and g. Since f is still accessible, it will retain the binding of first. 5) The compiler performed some other baroque optimization that extended the lifetime of first. None of these optimizations are forbidden by most language specifications. Sometimes they can be very profitable. (My impression is that SML of NJ does 4, though perhaps not in this case. Nobody other than the implementors would know.) Generational collection can also introduce similar phenomena if your heap gets big enough that you can't afford periodic full collections. If you are writing a long running server, it is possible to program defensively against such things. But the same is true for conservative collectors. Empirically, long running servers usually work just fine with conservative collection. There is a partial explanation of this, and a good discussion of an EXTREME form of conservative garbage collection in Wentworth, E. P., "Pitfalls of Conservative Garbage Collection", SP&E 20, 7 (July 1990) pp. 719-727. It points out when conservative garbage collection works even in very densely occupied address spaces, and when it doesn't. I claim all the failures he discusses can also occur with nonconservative collection and compiler optimization. He just cranked up the probabilities high enough to make them easily observable. Hans