Path: utzoo!attcan!uunet!husc6!mailrus!cornell!uw-beaver!uw-june!david From: david@june.cs.washington.edu (David Callahan) Newsgroups: comp.lang.c++ Subject: Generic functions -- a step toward polymorphism Message-ID: <6403@june.cs.washington.edu> Date: 12 Nov 88 22:34:46 GMT Reply-To: david@uw-june.UUCP (David Callahan) Organization: U of Washington, Computer Science, Seattle Lines: 227 In the following, I assume a knowledge of Stroustrup's papers on Parameterized types and pointers to member functions published in the USENIX C++ Workshop 87/Conference 88 proceedings (I don't have them with me so I'm not sure which). There is an undeniable need for "parameterized" class definitions. The oft-used example of a "vector" class that has different base elements is compelling. Stroustrup has proposed syntax for parameterized classes and functions. His intent was that the compiler instantiate specific types and function implementations as needed. I think of this as a type-safe macro preprocessor. (This is simplified of course, please correct me if its actually wrong.) Example: template vector { T* data ; int length ; public: vector(int l) { data = new T[length] ; } T& operator[](int i) { return T[i] ; } } ; Even if the bodies of the member functions are defined "out of line", they must be instantiated for each type that appears as a parameter to a "vector" in the program. One of the problems with this approach is the proliferation of function implementations. In "vector", these functions are trivial and so the impact is small, but in general there may be a lot of essentially redundant code. Another problem is that it requires the source for all member functions of the parameterized type to be available at every compile. This makes libraries, especially libraries of proprietary software, more complex to use. A second example is an homogenous list: template list { class link { public: T* data ; link* Next ; link(T* d, link* n) { data = d ; Next = n; } link* next() { link* temp = this->Next ; delete this ; return temp ; } } *head ; public: void insert(T* d) { head = new link(d,head) ; } T* head() { T* temp = head->data ; head = head->Next() ; return temp ; } } ; (I don't know if that's quite right, but you get the idea). How do implementations of this example differ for list and list? They don't unless "int*" and "double*" have different sizes which I assume is not the case. The only operator applied here is pointer assignment without casting and "new". The implementation of "vector" does depend on "sizeof(T)" but the "list" template is truly polymorphic with respect to type T and it is undesirable to instantiate an implementation for each declared type (I know the member functions will be inlined, assume for the sake of discussion that I declared them out of line). Let me define the term "generic" to refer to a function implementation that can take different types for some argument and return different types of results as long as the types associated with each argument position are "structurally" identical. Let me reserve the term "polymorphic" for the more general case where all actual arguments share a set operations but their implementations may differ structurally. So, the "list" class is generic but the "vector" class is polymorphic. Languages like SmallTalk (correct me if I'm wrong here) do not use the structure of an argument object when a function is implemented. This allows functions to be polymorphic. C++ on the other hand binds types to structures at compile time and exploits that binding in function implementations. Thus, the most the language can support are generic function (I'll return to this latter). Currently, C++ programmers get generic functions by explicitly casting objects to base types, invoking functions on those base types and then casting the result back to a derived type. The above list example could be handled by replacing "T" with "void" everywhere and then deriving a new class for each parameter that simply performs type casting: class list : public list { public: int* head {return (int*) list::head(); } } ; Is this sufficient? Maybe. Is it cumbersome? I think so. Let me now make a proposal for a language extension to support generic function definition. The syntax is modeled after templates: generic< class T : S, class X : Y, class Z > name { // class definition referring to objects // of classes T, X, Z } ; The arguments indicate that "T" is derived from base class "S", "X" is derived from base class "Y" and "Z" is any class (derived from void?). There is also a requirement inside the class definition that only operations legal for "S" may be applied to arguments of type "T" and similarly, only operations legal for "Y" may be applied to arguments of type "X" and only operations legal for "void" (such as "&" to "void*") may be applied to arguments of type "Z". In particular, the declaration of a member function may refer to "T", "X", and "Z" only as types of arguments and the result type. The body of the member functions and the member data objects may not refer to "T", "X", and "Z" but may refer to "S", "Y" , and "void". One exception is allowed to this rule: the cast operation from "S*" to "T*" is defined and assignment of T*'s and application of the "->*" (pointer-to-member dereferencing) is allowed. When a member function is defined "out of line", the same "generic< ... >" specification must proceed the function definition. How is the conversion from S* to T* accomplished? In a single-inheritance implementation, we can assume that no modification to an S* will be needed to get a T*. In a multiple inheritance-implementation, the value of S* may need to be offset by an amount depending on the type T. These offset amounts could be held in a static table like virtual functions pointers are and their values determined at compile time. The effect of this implementation is to embed a very small amount of structural information in the object that can be used efficiently at runtime. "Polymorphic" functions can be simulated using generic functions if the generic functions are explicitly parameterized by all of the operations needed by the polymorphic function. The "pointer to member object" type provides a way to do this. Assume that we want to write a function that sums integer values from each object in a list. Using templates: template list : { ... int list_sum(list* L) { link* cur = L->head ; int sum = 0 ; while (cur != NIL) { sum += cur->data->value ; cur = cur->next ; } return sum ; } } ; Note that this will generate a compile-time type fault if "T" does not have a "value" field. Consider now a "generic" version: generic list : { ... int list_sum(int (T::*) valuepm) { link* cur = this->head ; int sum = 0 ; while (cur != NIL) { sum += ((T*)(cur->data))->*valuepm ; cur = cur->next ; } return sum ; } } /* example */ class temp { public: int i } ; list *L ; ... L->list_sum(&temp::i) ... Here we have added an extra parameter that selects a field. This is information that is dynamically available in SmallTalk but must be statically supplied in C++. A side effect of this transformation is that the actual name of the field is no longer used and so this function could be used to sum along different integer fields. The same effect could be had in the template case as well. To restore the field name to the type specification would require a more powerful pointer-to-member type structure that would allow you to specify a type that is a "pointer to member of class with name ". Perhaps something like: member "value" int (T::*) valuepm ; but I see no use for this since it only restricts the applicability of an otherwise generic function. I'd like to return briefly to the question of polymorphic functions, in particular, polymorphic implementations of "template" functions. The process of transforming a "template" function to a "generic" function could be done automatically (hence a need for pointers to members with particular names). It requires that all of the members used be known, which generates a compilation order problem which may include circular dependencies. Efficient implementations will also require that all types be known so that "constant" member functions (especially operators like "->") can be "compiled in" (one of the virtues of specifying base types for type parameters is that is limits what operations may vary, also note that we are assuming that operator "->*" can not be redefined). The problem is straightforward given an adequate programming environment and may be important if there is a significant space/time or debuggability/time or compile-time/time or copy-protection/time trade off. A more difficult problem is deciding when to replicate implementations (clone) to exploit parameter-specific information. The "template" implementation outlined by Stroustrup takes the approach "clone everywhere" which in a large program could be a problem. The transformation to polymorphic takes the other extreme: only one implementation. Which solution is "right" is application/machine/life cycle dependent and probably requires a meta-language solution. These are early thoughts and comments/questions are encouraged. David Callahan, Tera Computer Co., Seattle WA.