Path: utzoo!attcan!uunet!wuarchive!gem.mps.ohio-state.edu!apple!rutgers!sunybcs!oswego!news From: dl@g.g.oswego.edu (Doug Lea) Newsgroups: comp.sw.components Subject: Re: Garbage Collection & ADTs Message-ID: Date: 27 Sep 89 20:21:03 GMT References: <900@scaup.cl.cam.ac.uk> <6530@hubcap.clemson.edu> <909@scaup.cl.cam.ac.uk> <62342@tut.cis.ohio-state.edu> <599@hsi86.hsi.UUCP> <16270@brunix.UUCP> Sender: news@oswego.Oswego.EDU (Network News) Reply-To: dl@oswego.edu Organization: SUNY Oswego Lines: 285 In-reply-to: twl@brunix's message of 26 Sep 89 14:32:39 GMT First a warning, I do quite a lot of C++ programming and a perhaps slightly contentious definition, I take OO programming to be that which is in part or whole concerned - - | construction | with the | mutation | of objects | inspection | | destruction | - - (And, of course, OO design to be concerned with the specification, definition, interrelationships, etc., of such objects...) Only half-facetiously, one could make the vicious claim that GC removes programmer control over 1/4 of the OO programming paradigm: In C++, at least, the notion of destruction of an object may well require some structured, active, resource deallocation, and need not just consist of passive reclamation of memory cells. The classic example where such control is useful is a Window object that needs to erase itself from view upon destruction. In a GC-based system, the mechanisms for implemententing such actions seem less than natural -- either the programmer would need to `logically' destroy the window, and then let the collector sometime later reclaim space, or perhaps the collector could be designed to do all kinds of storage reclamation associated with the object at some indeterminate time. My points are just that Structured destruction is a good idea. (As is structured construction.) Garbage collection can get in the way of this. And again, half-facetiously: GC is sometimes a byproduct of functional programming methodologies that allow programmers to pay attention only to `values', and not the objects required in order to arrive at them. All that said, I think GC is a very useful tool! Really! It is a serious contender if Destruction consists only of releasing storage and The lifetimes of objects are so unstructured/complex that manual reclamation is either slower, or more error-prone than GC. This kind of situation comes up a lot. But often a redesign that forces a more structured lifetime discipline supporting simpler techniques is at least as attractive. I take that as Wulf's main point. I think it's a good one. How about a real example? For the amusement of C++ fans following this discussion, I'll append one below. It's a fun variation of a classic kind of situation where GC is commonly used. There are Cells that may be linked together to create computational pipelines. While I set it up to use GC, other choices are possible. Here are some of the not-terribly-profound issues that came to my mind while writing it as a classroom example: * The main() doesn't really exercise the collector. One chain is created and destroyed during the run. * If this were a one-shot closed design, culminating only in a cute but slow prime number sieve program, the best solution is probably to do nothing, letting all deletion occur at program termination, since there are no other useful aspects of destruction. * If this were still self-contained, but a variation of the current main() were used as a callable function, then it would make sense to insert code to mark the base of the chain, use a stack-based allocator, and release it on destruction via modified constructor/ destructor code. You'd also want to insert code to ensure that at most one chain were active at a time. * In fact, the *current* code using Sieve consists only of single chains of Cells. If only such applications were supported, * One could delete space via a virtual no-op destructor in Cell, and one in Filter like ~Filter() { delete source; } to cause a chain of deletions to occur on destruction. * Except that the Cells do not know if their sources were dynamically allocated or not, so `delete source' can be an error if the source is a local. Perhaps, Cells could refuse to support `local' creation, and only exist dynamically. Alternatively some bookkeeping could be added inside each Cell to record whether it was dynamically allocated, and constructors and destructors would need to use it. The former is probably a better strategy, since the implementation is not set up to cleanly support by-value argument passing of Cells, etc., anyway, and there seems to be every reason not to do so. * Digression: Actually, this is all a bit of a pain in C++. In order to avoid clients having to say `new Whatever', and thereafter dealing with pointers, you'd need to set up a parallel set of classes, each of which served as pointers to the `real' objects of the corresponding classes, and then took care of the operator `new' stuff on creation, etc. It's a case where the convention taken in Eiffel, Smalltalk, etc., that names are by default just bound to references to objects, not the objects themselves, would be more convenient. In these kinds of situations, C++ programmers often settle for requiring clients to use pointers or references explicitly. But even this has drawbacks, since, while you can force clients to only create objects via `new', the only means for doing this leaves the restriction as a mostly run-time, not compile-time error. * But it is very possible for someone to make two Filters point to the same source, or to create new subclasses with multiple sources or sinks. * The simple GC solution would continue to work in such cases. * In order to deal with this and still keep virtual destructors, some form of reference counting would be required. Even this won't suffice if forms of circular paths that give reference counters problems were created, but such possibilities seem extremely unlikely, and defensible as a documented restriction, akin to the builtin limitation that only, say 32767 references are allowed before an object is considered permanent. The work involved in adapting this code to use reference counts is not great, but is greater than using GC. The overhead for reference counting vs GC would probably be close, depending on the quality of the implementations. In either case, storage management would probably amount to a tiny percentage of execution time. * Except that it makes no sense in *this* code for anyone to hook up two Filters to the same source, so perhaps Cells should record whether they've been used as sources and refuse to comply if they are re-attached. But this is reference-counting, just used in a different way. * Except that maybe someone might want to make a subclass of Cells (e.g., random number generators) that would be useful to multiply tap. * In either case, reference-counting might be more flexible, since it could be used both for allocation and to guard against some kinds of Cells being multiply tapped when this is would be an error. * What about programmers creating subclasses that display themselves in action on a screen, or record their actions in a log file? In such cases the reference-counting version seems necessary, since destruction would require real-time screen updates or file manipulation. So, perhaps, reference-counting would be more appropriate, since it may better serve the goal of a robust, reusable, extensible design. I'll leave implementation as an exercise to the reader (in other words, I'm too lazy to do it out right now!) The example has no particular deep moral, except to reflect my feeling that thinking about the full lifetimes of objects is good for you! It helps focus attention on various aspects of the semantics of new types in ways you otherwise might not have thought much about. And further, that GC-oriented issues and reusability *do* interact, but that GC need not always be chosen on such grounds. I suppose it's worth a mention that the issues based on extensibility via inheritence (i.e., design reuse) don't much come up in Ada. Perhaps others could offer other analyses of this or other examples. ------------------------------- sieve.cc // Fun with the prime number sieve. // Based loosely on R. Sethi ``Programming Languages'', sec 6.7 // link to Boehm's comp.sources.unix C GC routines ... extern "C" void gc_init(); extern "C" void* gc_malloc(unsigned int); class GC // ... but wrap gc routines as a C++ class { public: GC() { gc_init(); } void* alloc(unsigned int sz) { return gc_malloc(sz); } }; class Cell // an object with state and a way to update and report it { private: static GC gc; // one collector for all Cells protected: int val; // the value to output next virtual void update() {} // the updater; default as no-op public: Cell(int initial=0) :val(initial) {} int next() { update(); return val; } // do one step // set class allocation ops to use gc void* operator new(unsigned int sz) { return gc.alloc(sz); } void operator delete(void*) {} }; GC Cell::gc = GC(); // static class member initialization // must be done outside class decl in C++ class Counter : public Cell // output consecutive integers { protected: void update() { ++val; } public: Counter(int initial=0) :Cell(initial) {} }; class Filter : public Cell // transform input from another Cell { protected: Cell* src; // input source int inp; // last input void get() { inp = src->next(); } public: Filter(Cell* s, int initial=0) :Cell(initial), src(s), inp(0) {} }; class ModFilter : public Filter // hold last input not divisible by divisor { private: int divisor; int divides() { return inp % divisor == 0; } public: ModFilter(Cell* s, int d) : Filter(s), divisor(d) {} void update() { do get(); while (divides()); val = inp; } }; class Sieve : public Filter // Serve as terminal node of a prime number sieve { public: Sieve() :Filter(new Counter(1)) {} void update() { get(); src = new ModFilter(src, val = inp); } }; #include main(int argc, char** argv) // exercise everything a little { if (argc < 2) { cerr << "error: enter desired number of primes as program argument\n"; exit(1); } int n = abs(atoi(argv[1])); Sieve s; for (int i = 0; i < n; ++i) cout << s.next() << "\n"; } Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl -- Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl