Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!ucselx!petunia!kestrel.edu!gyro
From: Gyro@Reasoning.COM
Newsgroups: comp.std.c++
Subject: Proposed: "closures"
Keywords: proposal, closures, function pointers
Message-ID: <1991Apr12.081539.22690@kestrel.edu>
Date: 12 Apr 91 08:15:39 GMT
Sender: gyro@kestrel.edu (Scott Layson Burson)
Distribution: comp.std.c++
Organization: Kestrel Institute, Palo Alto, CA
Lines: 415


Here is a draft of a proposal I intend to submit to the C++ committee.
Your comments are solicited.  (Also could someone please post the
procedure for formally submitting a proposal to the committee?
Thanks.)

[410 lines follow]


		     PROPOSAL FOR CLOSURES IN C++
			DRAFT   12 April 1991
			   Scott L. Burson
			 (Gyro@Reasoning.COM)


0. INTRODUCTION
===============

The following commentary appears on p. 71 of Ellis & Stroustrup, in
the discussion of pointers to member functions:

    Naturally, it would be possible to generalize the notion of a
    pointer to member to allow bound pointers, such as `ptr_to_obj->*
    ptr_to_mfct', to be stored and, in general, be treated as first
    class objects.  Doing so, however, would open vast opportunities
    for generalization and language extension in the general area of
    "what is a function and how can I call it?" and would require
    implementation techniques outside the realm of traditional C
    techniques.  It was felt that restraint was in order.

I beg, nay implore, the committee to reconsider this issue.  In what
follows I will argue:

 1) That such "bound pointers" (I will use "closure", the traditional
    Lisp term) have a number of uses for which the nearest substitutes
    are substantially less satisfactory.

 2) That implementing closures at the user level using the existing
    mechanisms of C++, while possible, is unsatisfactorily verbose.

 3) That closure creation and invocation operations fit extremely
    neatly, syntactically and semantically, into the language.

 4) Contrary to the above, that the natural implementation of closures
    is no more "outside the realm of traditional C techniques" than
    that of many existing C++ constructs, and that the costs are well
    within the bounds of acceptability.

 5) Contrary to the above, that quite the opposite of opening a can of
    worms or of adding a wart to the language, the semantics of
    closures very neatly fill a hole in the current definition of C++.

I will venture to guess that a proposal of this nature, even if it is
thought to be of some merit, will not be looked upon enthusiastically
by those members of the committee eager to reach closure, so to speak,
in the standardization process.  I would not take the time to present
such a proposal did I not believe C++ likely to become the most widely
used programming language ever, and did I not think it likely that the
omission of closures will ultimately cost users far more, in inelegant
code, unnecessary casting, and excessive module interdependencies,
than it will cost to include it at this point.


1. PROPOSED FUNCTIONALITY
=========================

But first, before I press this argument, let me say quite concretely
what I am proposing.  There are actually two logically separable
subproposals; while I personally prefer to think of them as a package,
it would be possible to adopt the first without the second, and the
committee might well prefer to do it that way.

 1) Proposal 1 introduces a new space of types `closure of T' where T
    is any function signature.  (The question of declaration syntax
    will be addressed below.)  A closure of T can be created in two
    ways.  The first way is by evaluating the name of (or applying the
    unary `&' to) the name of a global function or static member
    function of signature T; the resulting object supports a single
    operation, that of function invocation, and when so invoked
    behaves exactly as a pointer to T, created in the same way, would
    have behaved.  The second way to create a closure of T is, within
    the body of a nonstatic member function, to evaluate the name of
    (or apply unary `&' to) the name of a nonstatic member function
    (call it F) of signature T.  A closure created in this latter way,
    when invoked, performs the operation F, passing it the same value
    for `this' as was in force at the point of *creation* of the
    closure; that is, it invokes the member function F on the same
    object as the one that created the closure (F is said to have been
    "closed over" this object).

    A closure over an object can be validly invoked only so long as the
    object continues to exist.

    Note that unlike pointers to member functions, the various closure
    types are distinguished only by their signatures; a value of type
    `closure of T' can be created by any class, or by no class (when a
    global or static member function is involved).
    
 2) Proposal 2 unifies the domain `closure of T' with that of `pointer
    to function of signature T'.  That is, under proposal 2, the same
    syntax that has been interpreted as declaring a pointer to
    function of signature T will now be interpreted as declaring a
    closure of T.

If proposal 1 is adopted but not proposal 2, it will be necessary to
invent a declaration syntax for closures.  I prefer to avoid doing so
unless and until it may be seen to be the sense of the committee that
this is a likely outcome.

It may bear pointing out that the rule "a closure over an object can be
validly invoked only so long as the object continues to exist" is of the
same form as the corresponding rule concerning a normal pointer which
points to a member variable of an object.  For this reason, I consider
it a natural approach to take in C++.

It is also worth pointing out that this rule constrains the semantics of
closures in such a way that no unusual implementation techniques are
required for them.  (I suspect that when Ellis and Stroustrup mentioned
"implementation techniques outside the realm of traditional C
techniques" they had in mind the fact that closures in, e.g., Lisp have
indefinite extent, and therefore their implementation is intimately tied
to the fact that Lisp supports garbage collection.)


2. WHY CLOSURES ARE USEFUL
==========================

The key to the value of closures was mentioned above: they permit a kind
of abstraction that is not otherwise available in C++, by making it
possible to refer to an operation with regard only to its signature,
without specifying what class it is a closure over, or indeed even
whether it is a closure over an object at all.  So, for an (admittedly a
bit contrived) example (which assumes both subproposals are accepted):

  EXAMPLE 1
  =========
  class counter {
       int total;
       int count_up(int n) { return total += n; }
       int count_down(int n) { return total -= n; }
     public:
       counter(int init) { total = init; }
       int (*count_up_fn())(int) { return &count_up; }
       int (*count_down_fn())(int) { return &count_down; }
  };

  class doubling_counter {
       int current;
       int add_next(int n) { current *= 2; return current += n; }
     public:
       doubling_counter(int init) { current = init; }
       int (*add_next_fn())(int) { return &add_next; }
  };

  int twice(int a) { return a * 2; }

  void print_some(int (*pf)(int), int n)
  {
       for (int i = 0; i < n; ++i) cout << pf(i) << " ";
       cout << "\n";
  }

  mumble()
  {
       print_some(&twice, 5);
       counter c(0);
       print_some(c.count_up_fn(), 5);
       print_some(c.count_down_fn(), 10);
       doubling_counter dc(0);
       print_some(dc.add_next_fn(), 7);
  }

Output of mumble():

  0 2 4 6 8
  0 1 3 6 10
  10 9 7 4 0 -5 -11 -18 -26 -35
  0 1 4 11 26 57 120

It can be seen from the extreme elegance of this code that closure
creation and invocation can fit very neatly indeed into the syntax and
semantics of C++.

Here, by way of contrast, is how the same example would be written using
a user-defined closure class; this shows what one would have to do to
write this code in the same style in C++ as it currently exists.

  EXAMPLE 2
  =========
  class closure_int_int {
    public:
       virtual int operator()(int arg) = 0;
  };

  class global_closure_int_int: public closure_int_int {
       int (*fn)(int);
    public:
       global_closure_int_int(int (*_fn)(int)) { fn = _fn; }
       int operator()(int arg) { return fn(arg); }
  };

  class counter;

  class counter_closure_int_int: public closure_int_int {
       int (counter::* pmf)(int);
       counter& c;
    public:
       counter_closure_int_int(counter& _c, int (counter::*_pmf)(int)) :
	  c(_c), pmf(_pmf) {}
       int operator()(int arg) { return (c.*pmf)(arg); }
  };

  class counter {
       int total;
       int count_up(int n) { return total += n; }
       int count_down(int n) { return total -= n; }
     public:
       counter(int init) { total = init; }
       counter_closure_int_int count_up_fn()
	  { return counter_closure_int_int(*this, &counter::count_up); }
       counter_closure_int_int count_down_fn()
	  { return counter_closure_int_int(*this, &counter::count_down); }
  };

  class doubling_counter;

  class doubling_counter_closure_int_int: public closure_int_int {
       int (doubling_counter::* pmf)(int);
       doubling_counter& c;
    public:
       doubling_counter_closure_int_int
	 (doubling_counter& _c, int (doubling_counter::*_pmf)(int)) :
	   c(_c), pmf(_pmf) {}
       int operator()(int arg) { return (c.*pmf)(arg); }
  };

  class doubling_counter {
       int current;
       int add_next(int n) { current *= 2; return current += n; }
     public:
       doubling_counter(int init) { current = init; }
       doubling_counter_closure_int_int add_next_fn()
	  { return doubling_counter_closure_int_int
		     (*this, &doubling_counter::add_next); }
  };

  int twice(int a) { return a * 2; }

  void print_some(closure_int_int& cl, int n)
  {
       for (int i = 0; i < n; ++i) cout << cl(i) << " ";
       cout << "\n";
  }

  mumble()
  {
       global_closure_int_int cl_twice(&twice);
       print_some(cl_twice, 5);
       counter c(0);
       counter_closure_int_int cl_up = c.count_up_fn();
       print_some(cl_up, 5);
       counter_closure_int_int cl_down = c.count_down_fn();
       print_some(cl_down, 10);
       doubling_counter dc(0);
       doubling_counter_closure_int_int dc_add = dc.add_next_fn();
       print_some(dc_add, 7);
  }

(I believe this code to be correct, but haven't tested it -- it shows
the general idea, anyway.)

There are several things to be pointed out about this example:

 1) It takes 78 lines to do what Example 1 did in 35.  It is also much
    less readable.

 2) One base class, in this case `closure_int_int', must be defined for
    each signature of closure to be created.

 3) One derived class must be created for each class to be closed over,
    plus an additional one for "closures" of global functions if
    desired.  (Derived classes of, e.g., `counter' could share `counter_
    closure_int_int', however.)

In practice, therefore, it is unlikely that anyone would actually write
a piece of code like Example 2, even if they had a good reason to.  An
opportunity for potentially valuable abstraction would therefore be
missed.  Let's look at some examples of applications for closures:

 1) General-purpose higher order functions (functions that take
    functions as arguments).  `print_some' above is a fairly
    uninteresting example, but it is not hard to think of better ones.
    Consider for instance a class for holding sequences of values; it
    might well support an operation, perhaps called `image', that
    takes a closure and returns a new sequence of the same type as
    itself, where each element is the value the closure returns when
    called on the corresponding element of the original sequence.

 2) A general-purpose error handling facility, which needs to be able
    to perform operations like restarting the erroneous computation,
    and which clearly should be able to be written and compiled
    independently of the classes it may eventually be invoked from.

 3) Closures are handy whenever there is a reason to write the control
    structure of a piece of code in a way that is "upside down" with
    respect to the most natural form.  Macintosh programmers, for
    instance, could use closures to write a general event loop
    facility, which would maintain a dynamically updatable collection
    of closures each with a representation of the kinds of events that
    should cause it to be invoked.

To summarize, the ultimate value that closures can contribute to C++
programs seems to boil down to two primary areas:

 1) In any case where pointers to functions are being used, closures
    add a very substantial generalization.

 2) Because the type of a closure is independent of the types of the
    objects that may create it, closures are a tool for reducing the
    dependencies between C++ modules.

The bottom line, then, is code that is more elegant and more modular.


3. IMPLEMENTATION CONSIDERATIONS
================================

Here I attempt to show that a closure facility such as I propose can be
implemented with perfectly acceptable efficiency, in the C++ tradition,
and with minimal changes to an imagined typical C++ compiler.

A closure can be implemented as a pair of: 1) a pointer to function in
the C sense, and 2) a pointer which will be passed as the value of
`this'.  In a typical C++ implementation, this translates into two 32-bit
words, where a C pointer to function takes only one word.  Thus, the
primary costs of changing all pointers to functions into closures are a
doubling of the space they require, plus the additional time required to
move the extra word when assigning them to variables or passing them to
or from functions.  However, a closure will (typically) be only as large
as a `double', which even in the early 16-bit implementations of C was
not considered too large to be frequently passed to and returned from
functions.

The question then arises: how do we deal with the slightly different
calling protocols for top-level functions and member functions, that is,
the fact that the C functions that implement member functions expect a
`this' pointer to be passed as an additional argument, while top-level
functions do not?  Several possibilities suggest themselves:

 1) Arrange that all top-level functions shall take an additional
    argument, (typically) at the beginning of the argument list, which
    they ignore.  There is then no need for the code that invokes a
    closure to make a runtime determination of which kind is being
    called.

 2) Do not change the conventions for top-level functions; instead, when
    creating a closure of a top-level function, be sure that the value
    stored in the `this' pointer is null.  Then have the closure
    invocation sequence check that pointer, and use different calling
    protocols depending on whether it is null (top-level function case)
    or not (member function case):

      (closure.this ? (*closure.pfn)(closure.this, <args>) :
                      (*closure.pfn)(<args>))

 3) Establish a convention for member function invocation such that the
    presence of the `this' argument does not change the protocol for
    passing any other arguments; that is, so that `this' is passed in
    some way outside the normal argument passing mechanisms.  Then the
    closure invocation sequence need make no runtime distinction between
    the top-level and member function cases.

 4) For each top-level function, define an interface function that takes
    the `this' argument (typically) at the beginning of its argument
    list, and then calls the top-level function with its correct
    argument list (i.e., without `this').  Then the closure invocation
    sequence need make no runtime distinction between the top-level and
    member function cases.

Of the first three, I will venture to guess that (1) will be the least
popular, although there are high-performance Scheme implementations that
work exactly this way.  Besides the overhead of adding an argument to
every top-level function, it doesn't work for top-level functions with C
linkage, where the option of adding an argument is not available
(although approach (4) could perhaps be used to cover that case).

Native C++ implementations will likely select (3), it seems to me,
although it is not restricted to them; Cfront, for instance, could pass
`this' in a global variable, using an explicit (i.e. C source level)
caller-saves protocol.

My guess, however, is that (2) will be the choice for translator-based
implementations such as Cfront.  (2) can work very nicely also for
native implementations that pass all arguments on the stack: the
generated code can optionally push the `this' pointer, then continue
with the normal calling sequence.

(4) initially seems attractive in this context, but it is incapable of
handling variable-length argument lists (though certain native
implementations may be able to solve this problem); perhaps, in order
explicitly to permit implementors the option of using it, closures whose
signatures include the ellipsis should be disallowed.


4. CONCLUSION
=============

I have presented a proposal for the introduction of closures into C++,
and a closely related proposal to the effect that they should replace
pointers to functions as they currently exist.  I have shown that they
permit a form of abstraction that is otherwise unavailable in the
language.  I have further shown that they are neither difficult nor
expensive to implement in the form proposed.

I therefore propose that closures be added to C++.