Path: utzoo!attcan!uunet!mcsun!ukc!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.object Subject: Re: C++ and garbage collection Message-ID: Date: 19 Sep 90 12:56:18 GMT References: <1990Sep12.074154.657@tukki.jyu.fi> <1990Sep16.090226.11051@tukki.jyu.fi> <57523@microsoft.UUCP> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 152 Nntp-Posting-Host: teacha In-reply-to: jimad@microsoft.UUCP's message of 18 Sep 90 17:50:14 GMT On 18 Sep 90 17:50:14 GMT, jimad@microsoft.UUCP (Jim ADCOCK) said: jimad> Seems to me for any GC system to work, there are a set of rules jimad> on fair usage of objects and references that must be obeyed. Just to be more precise: any program that uses a reclaimer has an interface contract with it, which must be respected. The interesting question is not whether the contract is more or less strict or fair, but whether the contract is more or less limiting than the rules of the language in which the program is written. It is my contention that automatic reclamation in the guise of conservative g.c. requires a *less* restrictive interface contract than the manual reclaimer for a language like C++. Just to make things clearer, I think it has come the time for me to summarize my understanding of the storage reclamation problem, so that at least my own "hidden assumptions" on which I base my reasoning are made explicit. Storage reclamation is based on the assumption that it is possible to identify all portions of data space that will be used in the future by the program. As a proxy to this unimplementable notion we use instead the stronger one that it is presumed that a portion of data space will be used in the future if there are valid references to it, i.e. if it just *possible* for the program to still reach it. Conceptually therefore storage reclamation consists of two conceptual activities, identifying which data areas are still reachable, and putting back in the pool of free space any data space that is not reachable. Note that because at least of the latter requirement, the storage reclaimer's deallocator component and the allocator must be connected; the three together form the storage manager. Automatic storage reclamation is the case where identification of reachable data space is done by a generalized, program independent, algorithm; manual reclamation is the case where each program does the identification process, and is given direct access to the storage deallocator. A garbage collecting automatic reclaimer is the case where identification of reachable data space is performed (at least in concept) by examining a snapshot of program memory; a reference counting automatic reclaimer is one (I cannot think of others though) case where identification is done by (at least in concept) examining dynamically how a program references data space. Normally there are several storage managers supporting a program's execution (just consider PL/1 or Modula-3, for example); for example often automatic variables are allocated on a stack associated with a very simplified reference counting storage manager (in Algol 68 the user is, unusually, given direct access to the stack manager's storage allocator, called LOC). Here we are interested in heap (i.e. arbitrary, non nested scope) storage managers, and in particular in those that manage their heaps using garbage collection as an automatic storage reclamation policy. We also are interested in the further subset of garbage collectors for which a "reference" is a storage address, or pointer. It is conceivable, e.g. to support referential integrity in a database system, to have garbage collectors to which a reference is a key represented as a string -- we are not discussing any of there here, even if it would be nearly the same, for as long as the definition of "reference" for both program and storage manager is the same, the problem does not really change. So, what does a garbage collector needs to know? 1) the extent of the heap(s) it must work upon 2) the pointers from outside the heap into the heap 3) which object a pointer points *into* 4) the size of an object given a pointer to it 5) which "words" in an object hold pointers Problems 2), 4) and 5) are often solved by having a type table available at run time to the storage manager and tagging each object as to its type; problem 1) is usually solved by having the compiler tag pointers in the stack and static areas (or any others) as well; problem 3), which is among the thorniest, is usually solved by forbidding pointers in the middle or an object, by using for that purpose a pointer+offset pair, of by forbidding pointers to the middle of an object altogether (Simula-67, Smalltalk, Eiffel...), and only using whole-object references throughout. Unfortunately C/C++ compilers do not produce any runtime information for the storage reclaimer. Fortunately a conservative garbage collector can collect *most* garbage using weaker technology than that required in points 1)-5). A conservative garbage collector storage reclaimer will be associated with a BIBOP style storage allocator; there will be a separate heap for each different size of object, and the collector will have access to the table describing the extent of each heap, and the size of object contained therein. Given a pointer it thus becomes easy to know whether it points into some heap, which heap, and which object within that heap it denotes. The only problems that remain are therefore 2) and 5), knowing which pointers point into the heap, from outside or outside. Here we have conservativeness: we scan all of non heap memory, and every word (we assume that pointers will be aligned) whose numeric value interpreted as an address lies between the bounds of an heap section is taken to be one. The same is done for every object so found; all words in every object that contain a numeric value that could be an address into a heap are assumed to hold pointers, and so on recursively. Notice that this process does not depend on how pointers were generated at all; it will only and always collect garbage, because all pointers to an heap object will have been correctly identified as such. It will not collect *all* garbage, because some words that contain integers or strings or floating point numbers will have been conservatively mistaked for addresses into the heap, and will have protected some garbage from being reclaimed. Notice that this is in practice extremely rare, taking a little bit of care, because it is not difficult to ensure that pointer into the heaps take numeric values that rarely occur as integers, strings, floats. The efficiency of conservative garbage collection, measured as the percentage of garbage that is not reclaimed, has been proven to be quite high; this is because on many common os/architecture pairs only few words that are not pointers into the heap look like they were, even without taking a lot of care, so that little garbage is left unreclaimed. Why? because on many machines the heaps are segments in fairly high storage addresses, the typical heap address being often a bit pattern like 0x0a00d1fc0 Most integers are not like this (almost all integers are < 100), neither are most floats (this looks like being denormalized), neither are most strings (strings are usually made up of alphanumerical characters, and are often longer), etc... Readily accesible papers on conservative garbage collection are those of Boehm, Bertlett and Edelson. I have already posted a compilation of information on this subject a while ago. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk