Path: utzoo!attcan!uunet!husc6!mailrus!cornell!uw-beaver!uw-june!david
From: david@june.cs.washington.edu (David Callahan)
Newsgroups: comp.lang.c++
Subject: Generic functions -- a step toward polymorphism
Message-ID: <6403@june.cs.washington.edu>
Date: 12 Nov 88 22:34:46 GMT
Reply-To: david@uw-june.UUCP (David Callahan)
Organization: U of Washington, Computer Science, Seattle
Lines: 227


In the following, I assume a knowledge of Stroustrup's
papers on Parameterized types and pointers to member functions
published in the USENIX C++ Workshop 87/Conference 88 proceedings
(I don't have them with me so I'm not sure which).

There is an undeniable need for "parameterized" class definitions.
The oft-used example of a "vector" class that has different
base elements is compelling. Stroustrup has proposed syntax
for parameterized classes and functions. His intent was
that the compiler instantiate specific types and function 
implementations as needed. I think of this as a type-safe
macro preprocessor. (This is simplified of course, please
correct me if its actually wrong.)

Example:

	template<class T> vector {
		T* data ;
		int length ;
	public:
		vector(int l) { data = new T[length] ; }
		T& operator[](int i) { return T[i] ; }
	} ;

Even if the bodies of the member functions are defined 
"out of line", they must be instantiated for each type that 
appears as a parameter to a "vector" in the program.
One of the problems with this approach is the proliferation
of function implementations. In "vector", these functions are
trivial and so the impact is small, but in general
there may be a lot of essentially redundant code.
Another problem is that it requires the source for all
member functions of the parameterized type to be
available at every compile. This makes libraries,
especially libraries of proprietary software, 
more complex to use.

A second example is an homogenous list:

	template<class T> list {
		class link {
		public:
			T* data ; link* Next ;
			link(T* d, link* n) 
				{ data = d ; Next = n;  } 
			link* next() {
				link* temp = this->Next ;
				delete this ;
				return temp ;
			} 
		} *head ;
	public: 
		void insert(T* d) {
			head = new link(d,head) ;
			}
		T* head() {
			T* temp  = head->data ;
			head = head->Next() ;
			return temp ;
		}
	} ;

(I don't know if that's quite right, but you get the idea).
How do implementations of this example differ for list<int> and
list<double>? They don't unless "int*" and "double*" have different
sizes which I assume is not the case. 
The only operator applied here is pointer assignment without
casting and "new". The implementation of "vector<T>" does depend on 
"sizeof(T)" but the "list<T>" template is truly polymorphic with
respect to type T and it is undesirable to instantiate an implementation
for each declared type (I know the member functions will be inlined,
assume for the sake of discussion that I declared them out of line).

Let me define the term "generic" to refer to a function implementation
that can take different types for some argument and return different
types of results as long as the types associated with each argument
position are "structurally" identical. Let me reserve the term
"polymorphic" for the more general case where all actual arguments share 
a set operations but their implementations may differ structurally.
So, the "list<T>" class is generic but the "vector<T>" 
class is polymorphic.

Languages like SmallTalk (correct me if I'm wrong here) do not 
use the structure of an argument object when a function is implemented.
This allows functions to be polymorphic. C++ on the other hand binds
types to structures at compile time and exploits that binding in function
implementations. Thus, the most the language can support are 
generic function (I'll return to this latter).

Currently, C++ programmers  get generic functions by explicitly
casting objects to base types, invoking functions on those base
types and then casting the result back to a derived type. The above
list example could be handled by replacing "T" with "void"
everywhere and then deriving a new class for each parameter
that simply performs type casting:

	class list<int> : public list<void> {
	public:
		int* head {return (int*) list<void>::head(); }
	} ;

Is this sufficient? Maybe. Is it cumbersome? I think so.
Let me now make a proposal for a language extension to support
generic function definition. The syntax is modeled after templates:

	generic< class T : S, class X : Y, class Z > name { 
	 	// class definition referring to objects 
		// of classes T, X, Z
	} ;

The arguments indicate that "T" is derived from base class "S",
"X" is derived from base class "Y" and "Z" is any class (derived
from void?). There is also a requirement inside the class 
definition that only operations legal for "S" may be applied 
to arguments of type  "T" and similarly, only operations legal 
for "Y" may be applied to arguments of type "X" and only operations 
legal for "void" (such as "&" to "void*") may be applied to arguments 
of type "Z". In particular, the declaration of a member function may 
refer to "T", "X", and "Z" only as types of arguments and
the result type. The body of the member functions and the 
member data objects may not refer to "T", "X", and "Z" but
may refer to "S", "Y" , and "void". One exception is allowed
to this rule: the cast operation from "S*" to "T*" is defined
and assignment of T*'s and application of 
the "->*" (pointer-to-member dereferencing)
is allowed.  When a member function 
is defined "out of line", the same "generic< ... >" 
specification must proceed the function definition.

How is the conversion from S* to T* accomplished? In 
a single-inheritance implementation, we can assume that
no modification to an S* will be needed to get a T*. 
In a multiple inheritance-implementation, 
the value of S* may need to be offset by an amount depending on 
the type T. These offset amounts could be held in a
static table like virtual functions pointers are and their 
values determined at compile time. The effect of this implementation 
is to embed a very small amount of structural information in the object
that can be used efficiently at runtime.

"Polymorphic" functions can be simulated using generic functions
if the generic functions are explicitly parameterized by all of
the operations needed by the polymorphic function. The 
"pointer to member object" type provides a way to do this.
Assume that we want to write a function that sums integer
values from each object in a list. Using templates:

	template<class T> list : { 
		... 
		int list_sum(list<T>* L) {
	 		link* cur = L->head ;
			int sum = 0 ;
			while (cur != NIL) {
				sum += cur->data->value ;
				cur = cur->next ;
			}
			return sum ;
		} 
	} ;

Note that this will generate a compile-time type fault if 
"T" does not have a "value" field. Consider now a "generic"
version:

	generic<class T> list : { 
		...
		int list_sum(int (T::*) valuepm) {
	 		link* cur = this->head ;
			int sum = 0 ;
			while (cur != NIL) {
				sum += ((T*)(cur->data))->*valuepm ;
				cur = cur->next ;
			}
			return sum ;
		} 
	}
   /* example */ 
	class temp { public: int i } ;
	list<temp> *L ; 
	... L->list_sum(&temp::i) ... 

Here we have added an extra parameter that selects a field.
This is information that is dynamically available in SmallTalk
but must be statically supplied in C++. A side effect of this
transformation is that the actual name of the field is no longer
used and so this function could be used to sum along different
integer fields. The same effect could be had in the template 
case as well. To restore the field name to the type specification
would require a more powerful pointer-to-member type structure that
would allow you to specify a type that is a "pointer to <type> member
of class <class> with name <name>". Perhaps something like:
	member "value" int  (T::*) valuepm ;
but I see no use for this since it only restricts the applicability
of an otherwise generic function.

I'd like to return briefly to the question of polymorphic
functions, in particular, polymorphic implementations of 
"template" functions. The process of transforming a "template"
function to a "generic" function could be done automatically
(hence a need for pointers to members with particular names).
It requires that all of the members used be known, which 
generates a compilation order problem which may include
circular dependencies. Efficient implementations will also require
that all types be known so that "constant" member functions
(especially operators like "->") can be "compiled in" (one of 
the virtues of specifying base types for type parameters
is that is limits what operations may vary, also note that
we are assuming that operator "->*" can not be redefined).
The problem is straightforward given an adequate programming
environment and may be important if there is a significant 
space/time or debuggability/time or compile-time/time or
copy-protection/time trade off.

A more difficult problem is deciding when to replicate
implementations (clone) to exploit parameter-specific
information. The "template" implementation outlined by 
Stroustrup takes the approach "clone everywhere" which
in a large program could be a problem. The transformation
to polymorphic takes the other extreme: only one implementation.
Which solution is "right" is application/machine/life cycle
dependent and probably requires a meta-language solution.

These are early thoughts and comments/questions are 
encouraged.

David Callahan, Tera Computer Co., Seattle WA.