Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!ucselx!petunia!kestrel.edu!gyro From: Gyro@Reasoning.COM Newsgroups: comp.std.c++ Subject: Proposed: "closures" Keywords: proposal, closures, function pointers Message-ID: <1991Apr12.081539.22690@kestrel.edu> Date: 12 Apr 91 08:15:39 GMT Sender: gyro@kestrel.edu (Scott Layson Burson) Distribution: comp.std.c++ Organization: Kestrel Institute, Palo Alto, CA Lines: 415 Here is a draft of a proposal I intend to submit to the C++ committee. Your comments are solicited. (Also could someone please post the procedure for formally submitting a proposal to the committee? Thanks.) [410 lines follow] PROPOSAL FOR CLOSURES IN C++ DRAFT 12 April 1991 Scott L. Burson (Gyro@Reasoning.COM) 0. INTRODUCTION =============== The following commentary appears on p. 71 of Ellis & Stroustrup, in the discussion of pointers to member functions: Naturally, it would be possible to generalize the notion of a pointer to member to allow bound pointers, such as `ptr_to_obj->* ptr_to_mfct', to be stored and, in general, be treated as first class objects. Doing so, however, would open vast opportunities for generalization and language extension in the general area of "what is a function and how can I call it?" and would require implementation techniques outside the realm of traditional C techniques. It was felt that restraint was in order. I beg, nay implore, the committee to reconsider this issue. In what follows I will argue: 1) That such "bound pointers" (I will use "closure", the traditional Lisp term) have a number of uses for which the nearest substitutes are substantially less satisfactory. 2) That implementing closures at the user level using the existing mechanisms of C++, while possible, is unsatisfactorily verbose. 3) That closure creation and invocation operations fit extremely neatly, syntactically and semantically, into the language. 4) Contrary to the above, that the natural implementation of closures is no more "outside the realm of traditional C techniques" than that of many existing C++ constructs, and that the costs are well within the bounds of acceptability. 5) Contrary to the above, that quite the opposite of opening a can of worms or of adding a wart to the language, the semantics of closures very neatly fill a hole in the current definition of C++. I will venture to guess that a proposal of this nature, even if it is thought to be of some merit, will not be looked upon enthusiastically by those members of the committee eager to reach closure, so to speak, in the standardization process. I would not take the time to present such a proposal did I not believe C++ likely to become the most widely used programming language ever, and did I not think it likely that the omission of closures will ultimately cost users far more, in inelegant code, unnecessary casting, and excessive module interdependencies, than it will cost to include it at this point. 1. PROPOSED FUNCTIONALITY ========================= But first, before I press this argument, let me say quite concretely what I am proposing. There are actually two logically separable subproposals; while I personally prefer to think of them as a package, it would be possible to adopt the first without the second, and the committee might well prefer to do it that way. 1) Proposal 1 introduces a new space of types `closure of T' where T is any function signature. (The question of declaration syntax will be addressed below.) A closure of T can be created in two ways. The first way is by evaluating the name of (or applying the unary `&' to) the name of a global function or static member function of signature T; the resulting object supports a single operation, that of function invocation, and when so invoked behaves exactly as a pointer to T, created in the same way, would have behaved. The second way to create a closure of T is, within the body of a nonstatic member function, to evaluate the name of (or apply unary `&' to) the name of a nonstatic member function (call it F) of signature T. A closure created in this latter way, when invoked, performs the operation F, passing it the same value for `this' as was in force at the point of *creation* of the closure; that is, it invokes the member function F on the same object as the one that created the closure (F is said to have been "closed over" this object). A closure over an object can be validly invoked only so long as the object continues to exist. Note that unlike pointers to member functions, the various closure types are distinguished only by their signatures; a value of type `closure of T' can be created by any class, or by no class (when a global or static member function is involved). 2) Proposal 2 unifies the domain `closure of T' with that of `pointer to function of signature T'. That is, under proposal 2, the same syntax that has been interpreted as declaring a pointer to function of signature T will now be interpreted as declaring a closure of T. If proposal 1 is adopted but not proposal 2, it will be necessary to invent a declaration syntax for closures. I prefer to avoid doing so unless and until it may be seen to be the sense of the committee that this is a likely outcome. It may bear pointing out that the rule "a closure over an object can be validly invoked only so long as the object continues to exist" is of the same form as the corresponding rule concerning a normal pointer which points to a member variable of an object. For this reason, I consider it a natural approach to take in C++. It is also worth pointing out that this rule constrains the semantics of closures in such a way that no unusual implementation techniques are required for them. (I suspect that when Ellis and Stroustrup mentioned "implementation techniques outside the realm of traditional C techniques" they had in mind the fact that closures in, e.g., Lisp have indefinite extent, and therefore their implementation is intimately tied to the fact that Lisp supports garbage collection.) 2. WHY CLOSURES ARE USEFUL ========================== The key to the value of closures was mentioned above: they permit a kind of abstraction that is not otherwise available in C++, by making it possible to refer to an operation with regard only to its signature, without specifying what class it is a closure over, or indeed even whether it is a closure over an object at all. So, for an (admittedly a bit contrived) example (which assumes both subproposals are accepted): EXAMPLE 1 ========= class counter { int total; int count_up(int n) { return total += n; } int count_down(int n) { return total -= n; } public: counter(int init) { total = init; } int (*count_up_fn())(int) { return &count_up; } int (*count_down_fn())(int) { return &count_down; } }; class doubling_counter { int current; int add_next(int n) { current *= 2; return current += n; } public: doubling_counter(int init) { current = init; } int (*add_next_fn())(int) { return &add_next; } }; int twice(int a) { return a * 2; } void print_some(int (*pf)(int), int n) { for (int i = 0; i < n; ++i) cout << pf(i) << " "; cout << "\n"; } mumble() { print_some(&twice, 5); counter c(0); print_some(c.count_up_fn(), 5); print_some(c.count_down_fn(), 10); doubling_counter dc(0); print_some(dc.add_next_fn(), 7); } Output of mumble(): 0 2 4 6 8 0 1 3 6 10 10 9 7 4 0 -5 -11 -18 -26 -35 0 1 4 11 26 57 120 It can be seen from the extreme elegance of this code that closure creation and invocation can fit very neatly indeed into the syntax and semantics of C++. Here, by way of contrast, is how the same example would be written using a user-defined closure class; this shows what one would have to do to write this code in the same style in C++ as it currently exists. EXAMPLE 2 ========= class closure_int_int { public: virtual int operator()(int arg) = 0; }; class global_closure_int_int: public closure_int_int { int (*fn)(int); public: global_closure_int_int(int (*_fn)(int)) { fn = _fn; } int operator()(int arg) { return fn(arg); } }; class counter; class counter_closure_int_int: public closure_int_int { int (counter::* pmf)(int); counter& c; public: counter_closure_int_int(counter& _c, int (counter::*_pmf)(int)) : c(_c), pmf(_pmf) {} int operator()(int arg) { return (c.*pmf)(arg); } }; class counter { int total; int count_up(int n) { return total += n; } int count_down(int n) { return total -= n; } public: counter(int init) { total = init; } counter_closure_int_int count_up_fn() { return counter_closure_int_int(*this, &counter::count_up); } counter_closure_int_int count_down_fn() { return counter_closure_int_int(*this, &counter::count_down); } }; class doubling_counter; class doubling_counter_closure_int_int: public closure_int_int { int (doubling_counter::* pmf)(int); doubling_counter& c; public: doubling_counter_closure_int_int (doubling_counter& _c, int (doubling_counter::*_pmf)(int)) : c(_c), pmf(_pmf) {} int operator()(int arg) { return (c.*pmf)(arg); } }; class doubling_counter { int current; int add_next(int n) { current *= 2; return current += n; } public: doubling_counter(int init) { current = init; } doubling_counter_closure_int_int add_next_fn() { return doubling_counter_closure_int_int (*this, &doubling_counter::add_next); } }; int twice(int a) { return a * 2; } void print_some(closure_int_int& cl, int n) { for (int i = 0; i < n; ++i) cout << cl(i) << " "; cout << "\n"; } mumble() { global_closure_int_int cl_twice(&twice); print_some(cl_twice, 5); counter c(0); counter_closure_int_int cl_up = c.count_up_fn(); print_some(cl_up, 5); counter_closure_int_int cl_down = c.count_down_fn(); print_some(cl_down, 10); doubling_counter dc(0); doubling_counter_closure_int_int dc_add = dc.add_next_fn(); print_some(dc_add, 7); } (I believe this code to be correct, but haven't tested it -- it shows the general idea, anyway.) There are several things to be pointed out about this example: 1) It takes 78 lines to do what Example 1 did in 35. It is also much less readable. 2) One base class, in this case `closure_int_int', must be defined for each signature of closure to be created. 3) One derived class must be created for each class to be closed over, plus an additional one for "closures" of global functions if desired. (Derived classes of, e.g., `counter' could share `counter_ closure_int_int', however.) In practice, therefore, it is unlikely that anyone would actually write a piece of code like Example 2, even if they had a good reason to. An opportunity for potentially valuable abstraction would therefore be missed. Let's look at some examples of applications for closures: 1) General-purpose higher order functions (functions that take functions as arguments). `print_some' above is a fairly uninteresting example, but it is not hard to think of better ones. Consider for instance a class for holding sequences of values; it might well support an operation, perhaps called `image', that takes a closure and returns a new sequence of the same type as itself, where each element is the value the closure returns when called on the corresponding element of the original sequence. 2) A general-purpose error handling facility, which needs to be able to perform operations like restarting the erroneous computation, and which clearly should be able to be written and compiled independently of the classes it may eventually be invoked from. 3) Closures are handy whenever there is a reason to write the control structure of a piece of code in a way that is "upside down" with respect to the most natural form. Macintosh programmers, for instance, could use closures to write a general event loop facility, which would maintain a dynamically updatable collection of closures each with a representation of the kinds of events that should cause it to be invoked. To summarize, the ultimate value that closures can contribute to C++ programs seems to boil down to two primary areas: 1) In any case where pointers to functions are being used, closures add a very substantial generalization. 2) Because the type of a closure is independent of the types of the objects that may create it, closures are a tool for reducing the dependencies between C++ modules. The bottom line, then, is code that is more elegant and more modular. 3. IMPLEMENTATION CONSIDERATIONS ================================ Here I attempt to show that a closure facility such as I propose can be implemented with perfectly acceptable efficiency, in the C++ tradition, and with minimal changes to an imagined typical C++ compiler. A closure can be implemented as a pair of: 1) a pointer to function in the C sense, and 2) a pointer which will be passed as the value of `this'. In a typical C++ implementation, this translates into two 32-bit words, where a C pointer to function takes only one word. Thus, the primary costs of changing all pointers to functions into closures are a doubling of the space they require, plus the additional time required to move the extra word when assigning them to variables or passing them to or from functions. However, a closure will (typically) be only as large as a `double', which even in the early 16-bit implementations of C was not considered too large to be frequently passed to and returned from functions. The question then arises: how do we deal with the slightly different calling protocols for top-level functions and member functions, that is, the fact that the C functions that implement member functions expect a `this' pointer to be passed as an additional argument, while top-level functions do not? Several possibilities suggest themselves: 1) Arrange that all top-level functions shall take an additional argument, (typically) at the beginning of the argument list, which they ignore. There is then no need for the code that invokes a closure to make a runtime determination of which kind is being called. 2) Do not change the conventions for top-level functions; instead, when creating a closure of a top-level function, be sure that the value stored in the `this' pointer is null. Then have the closure invocation sequence check that pointer, and use different calling protocols depending on whether it is null (top-level function case) or not (member function case): (closure.this ? (*closure.pfn)(closure.this, ) : (*closure.pfn)()) 3) Establish a convention for member function invocation such that the presence of the `this' argument does not change the protocol for passing any other arguments; that is, so that `this' is passed in some way outside the normal argument passing mechanisms. Then the closure invocation sequence need make no runtime distinction between the top-level and member function cases. 4) For each top-level function, define an interface function that takes the `this' argument (typically) at the beginning of its argument list, and then calls the top-level function with its correct argument list (i.e., without `this'). Then the closure invocation sequence need make no runtime distinction between the top-level and member function cases. Of the first three, I will venture to guess that (1) will be the least popular, although there are high-performance Scheme implementations that work exactly this way. Besides the overhead of adding an argument to every top-level function, it doesn't work for top-level functions with C linkage, where the option of adding an argument is not available (although approach (4) could perhaps be used to cover that case). Native C++ implementations will likely select (3), it seems to me, although it is not restricted to them; Cfront, for instance, could pass `this' in a global variable, using an explicit (i.e. C source level) caller-saves protocol. My guess, however, is that (2) will be the choice for translator-based implementations such as Cfront. (2) can work very nicely also for native implementations that pass all arguments on the stack: the generated code can optionally push the `this' pointer, then continue with the normal calling sequence. (4) initially seems attractive in this context, but it is incapable of handling variable-length argument lists (though certain native implementations may be able to solve this problem); perhaps, in order explicitly to permit implementors the option of using it, closures whose signatures include the ellipsis should be disallowed. 4. CONCLUSION ============= I have presented a proposal for the introduction of closures into C++, and a closely related proposal to the effect that they should replace pointers to functions as they currently exist. I have shown that they permit a form of abstraction that is otherwise unavailable in the language. I have further shown that they are neither difficult nor expensive to implement in the form proposed. I therefore propose that closures be added to C++.